the_od_bods
the_od_bods copied to clipboard
Stagecoach scraper
[!IMPORTANT]
Still in progress. Sharing draft with community
Outstanding tasks
- [ ] Contact Stagecoach to ask them to confirm dataset licence and if they can fix a malformed HTML tag on their website
- [x] Add datasets into
merge_data.py - [ ] Test full pipeline and check how datasets appear on frontend
Description
- Stagecoach scraper using new JSON scraper format
- Adds new common methods to
processor.pyfor HTML scraping tooget_htmlget_html_headget_http_content_length
Motivation and Context
Closes #120
How Has This Been Tested?
Ran script file locally and produces appropriate dataset file in data\bespoke_Stagecoach\Stagecoach.json
Screenshots (if appropriate):
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [X] My code follows the code style of this project.
- [X] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [X] I have read the CONTRIBUTING document.
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
Contacted Stagecoach's open data email address via [email protected] to query license and metadata