database icon indicating copy to clipboard operation
database copied to clipboard

Fix or rewrite scraper engine and scrapers

Open dag7dev opened this issue 11 months ago • 3 comments
trafficstars

Many years have passed since its first release.

Known demozoo bug:

  • it always select the first secondary mirror assuming that is from scene.org. This is wrong and should be carefully checked
  • demozoo is missing the screenshots, it always downloads only the first one

Known scrape engine bugs:

  • sometimes it sets "screenshots" to None: it should set that field to empty list if no screenshot detected
  • it doesn't handle very well other extensions than gb, therefore, it may be a good idea to fix this thing, sometimes in the manifest there is "gbc" but the engine has already renamed the file in gb

Other improvements:

  • change the general logic to become more flexible (e.g. select best source, include other extensions)
  • write a basic test suite
  • test from scratch other scrapers, since they may have become buggy due to change in the master scraper

dag7dev avatar Dec 18 '24 09:12 dag7dev

@dag7dev demozoo is also missing the screenshots, it always downloads only the first one

avivace avatar Dec 18 '24 21:12 avivace

@avivace it is bug of the master engine I think

dag7dev avatar Dec 18 '24 21:12 dag7dev

I believe the "more screenshots" link is simply never navigated, only the first screenshot appearing in the main page is considered.

See: https://github.com/gbdev/database/blob/master/scrapers/py_importers/demozoo.py#L186

avivace avatar Dec 18 '24 21:12 avivace