web-crawling topic
robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
Terpene-Profile-Parser-for-Cannabis-Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
crawler
Library for Rapid (Web) Crawler and Scraper Development
Katastrophe
Command Line Tool to download torrents
Scrapy-Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
Amazon-Flipkart-Price-Comparison-Engine
Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart :moneybag: :bar_chart:
CrawlerX
CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.
JAW
JAW: A Graph-based Security Analysis Framework for Client-side JavaScript