advanced-web-scraping-tutorial
advanced-web-scraping-tutorial copied to clipboard
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Advanced Web Scraping Tutorial Project
This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.
This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:
- User agent filtering.
- Obfuscated javascript redirects.
- Captchas.
- Header consistency checks.
The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.