Web Scraping Exercises is a collection which helps people learn Web Scraping. People can use it to improve Web Scraping skills by solving the practical Exercises.
They are available at ScrapingClub. It is a website which includes many web scraping exercise and high-quality web scraping tutorials.
Why I create this project
Most blog posts about Web Scraping talks about how to crawl a specific website or pages, but I insist it makes more sense to help people to learn how to analyze the website and choose the right way to get the job done well.
That is why I build
Web Scraping exercises, my goal is to try to break down a complex web scraping mission such as crawling a bunch of websites to some small tasks so people can learn how to solve them step by step. What is more, if they have trouble solving the exercises, they can ask for help with more detail instead of "I have trouble crawling the website, please help!".
Who might need this project
Any people who want to learn web scraping, test the web scraping skills or want to make it for fun might need this project.
How it works
You will see many product detail pages and list pages on ScrapingClub. For example, Two product detail pages might look the same but use different ways to process the data. People should figure out the way and write a spider to extract data.
Short descriptions are at the top of each exercise page, which can help you understand what needs to do. And the tips or links can help you learn web scraping better. Below is a screenshot of one of the exercises.
What if you have trouble completing the exercise?
Later I will create a project on Github hosting the solution code, and I will also write articles in more detail, please feel free to send me the message to let me know your thoughts.
Table Of Contents
- Basic Info Scraping Web scraping using XPath or CSS expression
- Analyze JSON Load JSON string and extract data
- Recursively Scraping pages Not only crawl products but also handle pagination
- Mimicking Ajax requests Inspect Ajax requests and mimic them
- Inspect HTTP request Learn to inspect the fields of HTTP request
- Scraping Infinite Scrolling Pages (Ajax) Learn to scrape infinite scrolling pages
- Find gold in cookie Make your spider can work with the cookie
- Login form Scrape data behind login form
- Solve Captcha Learn to scrape data behind a captcha
I will add more in the next weeks, and you can subscribe to Mailing List to keep updated.