recipe-scrapers
recipe-scrapers copied to clipboard
ah.nl image() returns '' instead of image url
Pre-filing checks
- [X] I have searched for open issues that report the same problem
- [X] I have checked that the bug affects the latest version of the library
The URL of the recipe(s) that are not being scraped correctly
- https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen
- .. More from ah.nl
...
The results you expect to see
>>> r = scrape_me("https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen")
>>> r.image()
'https://static.ah.nl/static/recepten/img_003177_1224x900_JPG.jpg'
...
The results (including any Python error messages) that you are seeing
>>> r = scrape_me("https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen")
>>> r.image()
>>> r.schema.data.get('image')
['', 'https://static.ah.nl/static/recepten/img_003177_1224x900_JPG.jpg']
>>>
I can make a PR for this, but I'm just asking the community: Would this be best solved in _schemaorg.py
with:
if isinstance(image, list):
- # Could contain a dict
- image = image[0]
+ # Get the first not empty item
+ next(s for s in image if s)
Or in albertheijn.py
:
def image(self):
- return self.schema.image()
+ image_data = self.schema.data.get('image')
+ return next(s for s in image_data if s)
Hey, thanks for taking the time here!
Solve it in _schemaorg.py pls :) The snippet you provided lgtm