Python Data Scraping Expert

Looking to hire a web scraping developer to handle different portions of large web scraping projects. The tasks would entail writing Python-based code to extract large amounts of data from different websites at scale.

Apply now
About Atlas
Requirements & Responsibilities

Atlas helps companies develop data-driven solutions to solve their toughest business challenges.

Atlas uses data to help clients capture leads, monitor prices, identify patterns, make important decisions, and more. Over the past several years, Atlas has worked with dozens of Enterprise clients such as Amazon, Chegg, Spring (formerly TeeSpring), and more. Taking a data-focused approach helps businesses large and small reach new heights.

In order to help our clients grow, Atlas offers the full circle of data engineering-related tasks:

  1. Extracting - pull unstructured data from websites and APIs
  2. Processing - clean and engineer data to be in the desired format
  3. Loading - structure and store data in an external source (SQL, CSV, Excel, Snowflake, Airtable, etc)
  4. Analyzing - apply modern state-of-the-art algorithms to generate high-value insights

Join us in shaping tomorrow together!

Atlas combines the many facets of data work into a unified approach, allowing our clients to get clean, organized data that is ready to power their growing business needs. Through Atlas’s structured approach to tackling complex business problems and generating data-driven solutions, we turn client projects into long-lasting, big-impact work that allows them to continue growing their companies.

If this work sounds like something you are interested in, apply now!

Apply now

Python

This role will require an understanding of Python web scraping libraries such as:

  • beautifulsoup4
  • requests
  • cloudscraper
  • httpx
  • scrapy
  • selenium

Different websites will have very different website structures, often using JavaScript rendering and/or cookies to make the final API endpoint requests to get the desired data. This will require a sharp understanding of how to inspect network traffic on Chrome and observe the call stack to determine which endpoints are needed to be hit in order to efficiently scrape data.

Therefore, unless specified otherwise, all scraping programs should be fully HTTP request-based (no use of Selenium, Puppeteer, or any other browser control tools whatsoever).

Skillsets

It is also important to have an understanding of concepts such as multi-threading and proxy servers to efficiently scrape data from websites. I will have a somewhat flexible format for what the code should look like, and we can discuss good code practices to follow for different projects.

At a high level, familiarity with the following concepts will be immensely valuable:

Data Scraping Skillset:

  • Python Web Scraping Libraries (listed above)
  • Multi-Threading
  • Proxy Server Integration

Data Engineering Skillset:

  • Basic Python Data Analysis Libraries (Pandas, NumPy)
  • PostgreSQL (for very simple SQL queries)
  • AWS Cloud Tools (EC2, RDS, S3, AWS CLI, CloudWatch, CloudFormation)

General Best Practices & Automation Skillset:

  • Shell scripting & general code automation tools
  • Creating code templates
  • Automating redundant commands and deployment
  • Docker
  • Git
  • Software best practices (logging, documentation, testing, automation)

You will be joining a team of one or more developers who overlap in experience with all of the above topics in case you need help. Lastly, I will ask that you sign an NDA to protect Atlas IP such as code, data, and information shared internally either by Atlas or by any of Atlas’s clients.

If you feel like your skillset line up with these skills, apply now!

Apply now