recipe-scrapers icon indicating copy to clipboard operation
recipe-scrapers copied to clipboard

BettyBossi is not working

Open Danit2 opened this issue 11 months ago • 9 comments

Pre-filing checks

  • [x] I have searched for open issues that report the same problem
  • [x] I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see

I use "Mealie" on Home Assistant and there the Betty Bossi Website is not working. In the repositories from Mealie they say this is a problem from the recipe-scrapers

The results (including any Python error messages) that you are seeing

I become a error message from Mealie.

Danit2 avatar Mar 16 '24 14:03 Danit2

Hi @Danit2 - thank you for the bugreport, we should be able to investgate this soon.

There are two details that would be helpful to narrow this down, if available:

  • Does mealie indicate the verson of recipe-scrapers that is in use? (I would guess it will look something like v14.50.1 or similar)
  • Are there any details in the error message? (such as Failed to retrieve recipe title or similar)

Thanks!

jayaddison avatar Mar 17 '24 11:03 jayaddison

Hi @jayaddison

Thanks for your answer.

My version of Mealie use the recipe-scrapers version 14.55.0 image

On the Logs i don't see anything. I'am Sorry.

INFO: 17-Mar-24 14:57:48 	HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
INFO: 17-Mar-24 14:57:48 	HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
127.0.0.1:38902 - "POST /api/recipes/create-url HTTP/1.1" 400
[17/Mar/2024:14:57:48 +0100] 400 164.14.140.15, 172.30.33.17(172.30.32.1) POST /api/recipes/create-url HTTP/1.1 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36)

I only become this error message. image

I hope you can help.

Thanks.

Danit2 avatar Mar 17 '24 14:03 Danit2

Extremely helpful, thank you @Danit2 - I hope to investigate this within the next day or so.

jayaddison avatar Mar 18 '24 16:03 jayaddison

Ok, this is an interesting bug. I think what is happening here is that:

  • Mealie requests the recipe page from BettyBossi.
  • BettyBossi responds to some/all requests with a tiny HTML page containing a JavaScript snippet that redirects to the recipe (this can be an effective bandwidth/bot-reduction technique).
  • Mealie receives the minimal redirect page as HTML, but the HTTP client it uses (httpx) - like many/most Python HTTP clients - does not evaluate the JavaScript code, so the tiny HTML (with no recipe content) is returned.
  • recipe-scrapers received the tiny HTML page and doesn't find the recipe information in there.

My guess is that if a user-agent followed the redirect to get to the recipe URL, and downloaded the HTML from that second page, then recipe-scrapers would be able to extract the recipe metadata.

I'll have to spend a bit of time to think about this. It could be good to double-check this theory, too, if anyone out there has time to help.

jayaddison avatar Mar 19 '24 21:03 jayaddison

I would be willing to help solving the problem with Betty Bossi, though I am not a developer.

Zwirbel1 avatar Mar 27 '24 14:03 Zwirbel1

@Zwirbel1 if you have time, then if you could check whether any open source recipe management / import utilities are able to handle BettyBossi could be useful info for this, to get an idea for whether the same problem has been solved elsewhere (and perhaps how).

jayaddison avatar Mar 29 '24 10:03 jayaddison

I have tested it last week with Tandoor, which was able to import a recipe from Betty Bossy in the demo version online: https://docs.tandoor.dev/. Here's the menu I have imported into the demo version of Tandoor: https://app.tandoor.dev/view/recipe/53071.

Zwirbel1 avatar Apr 02 '24 11:04 Zwirbel1

It seems like bettybossi.ch uses anti-scraping techniques and as mentioned already in this issue (https://github.com/hhursev/recipe-scrapers/issues/531) you need to reload the page 2 times in order to get the correct HTML.

SwissOS avatar May 08 '24 14:05 SwissOS

@SwissOS : I tried reloading the page various times, but I get the same URL and HTML, which does not allow me to import the recipe. Anything else I can change to get the correct HTML / URL?

Zwirbel1 avatar May 29 '24 11:05 Zwirbel1