web-scrapers
web-scrapers copied to clipboard
A repository of my web-scraping projects
web-scrapers
This is a repository for my web-scraping projects.
Requirements
- python 3.5+
- scrapy 1.5+
Text Summarization
News articles and their bullet-point summaries scraped from Times of India News Archive.
Medical NER
Diseases and treatments/tests scraped from medical websites. Following gazetteers have been been created:
- malacards-diseases scraped from malacards.org (18455 entries).
- medicinenet-diseases scraped from medicinenet.com (4969 entries).
- medicinenet-treatments scraped from medicinenet.com (931 entries).
References
- http://sangaline.com/post/advanced-web-scraping-tutorial/