SearchEngineScrapy
SearchEngineScrapy copied to clipboard
Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com, Yandex.com
SearchEngineScrapy - Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com, Yandex.com,
Intro
SearchEngineScrapy is a web crawler and scraper for scraping data off various search engines such as Google.com, Bing.com, Yahoo.com, Ask.com, Baidu.com, Yandex.com It is based on Python Scrapy project and is developed using Python 2.7
Setup
virtualenv --python="2" env
env/bin/activate
git clone https://github.com/naqushab/SearchEngineScrapy.git
cd SearchEngineScrapy
pip install -r requirements.txt
Usage
Prefix : -a
Params
searchQuery="
searchEngine="
pages=
Prefix : -o
Params
Examples
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman"
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -o filename.json
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -a searchEngine="Google" -o filename.xml
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -a searchEngine="Google" -a pages=5 -o filename.csv
TODO
- Add support for DDG
- Ability to provide parameter of what to save
- Ability to export to various formats (currently limited to JSON, JSONLINES, CSV, XML)
- Contributing section