recipe-scrapers icon indicating copy to clipboard operation
recipe-scrapers copied to clipboard

ah.nl image() returns '' instead of image url

Open Noxeus opened this issue 1 year ago • 1 comments

Pre-filing checks

  • [X] I have searched for open issues that report the same problem
  • [X] I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

  • https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen
  • .. More from ah.nl

...

The results you expect to see

>>> r = scrape_me("https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen")
>>> r.image()
'https://static.ah.nl/static/recepten/img_003177_1224x900_JPG.jpg'

...

The results (including any Python error messages) that you are seeing

>>> r = scrape_me("https://www.ah.nl/allerhande/recept/R-R292556/romige-tagliatelle-met-oesterzwammen")
>>> r.image()
>>> r.schema.data.get('image')
['', 'https://static.ah.nl/static/recepten/img_003177_1224x900_JPG.jpg']
>>> 

I can make a PR for this, but I'm just asking the community: Would this be best solved in _schemaorg.py with:

         if isinstance(image, list):
-            # Could contain a dict
-            image = image[0]
+            # Get the first not empty item
+            next(s for s in image if s)

Or in albertheijn.py:

     def image(self):
-        return self.schema.image()
+        image_data = self.schema.data.get('image')
+        return next(s for s in image_data if s)

Noxeus avatar Sep 29 '23 07:09 Noxeus

Hey, thanks for taking the time here!

Solve it in _schemaorg.py pls :) The snippet you provided lgtm

hhursev avatar Sep 29 '23 13:09 hhursev