database
database copied to clipboard
Fix or rewrite scraper engine and scrapers
trafficstars
Many years have passed since its first release.
Known demozoo bug:
- it always select the first secondary mirror assuming that is from scene.org. This is wrong and should be carefully checked
- demozoo is missing the screenshots, it always downloads only the first one
Known scrape engine bugs:
- sometimes it sets "screenshots" to None: it should set that field to empty list if no screenshot detected
- it doesn't handle very well other extensions than gb, therefore, it may be a good idea to fix this thing, sometimes in the manifest there is "gbc" but the engine has already renamed the file in gb
Other improvements:
- change the general logic to become more flexible (e.g. select best source, include other extensions)
- write a basic test suite
- test from scratch other scrapers, since they may have become buggy due to change in the master scraper
@dag7dev demozoo is also missing the screenshots, it always downloads only the first one
@avivace it is bug of the master engine I think
I believe the "more screenshots" link is simply never navigated, only the first screenshot appearing in the main page is considered.
See: https://github.com/gbdev/database/blob/master/scrapers/py_importers/demozoo.py#L186