Ghost
Ghost copied to clipboard
Bookmarker thumbnail breaks on some links
Issue Summary
When adding a bookmark, the thumbnail is missing on some URLs. Recreating the bookmark does not help.
Does not work: https://www.komoot.com/es-es/tour/907703588?share_token=aGz0MeTiQ4l20kR1I3fZZ8jqo30FhUDfOaVBzw0HL1x2pqqqlK&ref=
Works: https://www.komoot.com/es-es/tour/907708579?share_token=a0l1fiZBYZHhKJr8IRogCPRNxTATckU1RhbHIRxoN4QtdgPzZn&ref=
Steps to Reproduce
- Paste link in post
Ghost Version
5.86.2
Node.js Version
18.20.3
How did you install Ghost?
Docker
Database type
MySQL 8
Browser & OS version
Chrome
Relevant log / error output
No response
Code of Conduct
- [X] I agree to be friendly and polite to people in this repository
Hi @eduardopuente , There seems to be an issue with the image used in the bookmarker thumbnail. Could you please confirm if the image is still available? The server is returning a 404 Not Found error.
Below are some of the screenshots :
Actual Image URL used :
Ghost was unable to load resource as the server responded with 404 (Not Found) :
@cmraible , I’m new to Ghost, so please correct me if I’m wrong.
The bookmarking functionality relies on reading metadata from 3rd party websites, and then relies on that to be correct and point to valid images. There are lots of tiny ways that this can go wrong.
Happy to accept PRs which help make this process more robust!
Hi folks! I dug deeply into this one and came up with the reason this is happening, but no simple solution. I'll submit an issue with the library in question and a PR to fix it, but I'm not sure the library author will accept it. We'll see.
TL;DR: The problem is not in Ghost or its related repositories. Once I work out the issue with the upstream repo's maintainer, we would need to wait (or encourage) Ghost's dependency (metascraper) to update its dependency (normalize-url), then perhaps we can update metascraper in Ghost Core.
If you visit the example page provided by the OP that doesn't show a thumbnail, you can find a meta tag in the HTML source for og:image with this URL:
https://tourpic-vector.maps.komoot.net/r/big/u%60t%5Be_pC~EzKnI%60p@h@nf@xFzJeB~EdQdXbH%60e@pHvOsAxp@lO%60VnA%7CVvyAhnAbYff@~P~ItT%5CjNjm@%7CQtU/?width=768&height=576&crop=true
See that %5C near the end of the path? That's a backslash (\) if you decode it. That's where the fun begins. If you visit the above URL in your browser, the image will display just fine. If you replace the %5C with \, you should see your browser flip the backslash to a forward slash (/).
https://tourpic-vector.maps.komoot.net/r/big/u%60t%5Be_pC~EzKnI%60p@h@nf@xFzJeB~EdQdXbH%60e@pHvOsAxp@lO%60VnA%7CVvyAhnAbYff@~P~ItT\jNjm@%7CQtU/?width=768&height=576&crop=true
If you click the above URL, it looks like GitHub is encoding the \ back to %5C. So, go ahead and manually try to change the %5C to \ in your address bar, and you should see the request fail with a 404, because the browser was trying to be helpful and replaced your \ with /. However, that's the wrong URL, though.
Ghost uses the metascraper library. metascraper is made of tiny packages, and one of those is metascraper-image, which parses the thumbnail image intended to represent a web page (via oembed metadata). The image package uses metascraper-helpers. Buried in the image parsing rules is code to sanitize and normalize the image URL. The normalize-url library is used for this, and it decodes the URL's path name and then assigns it back to its working URL object.
Since a backslash (\) is invalid in a URL, v8 assumes you meant a forward slash (/) when it sees it in the path, so it flips it for you. How nice! The percent-encoded equivalent to a backslash is %5C, and that is valid in the URL. Unfortunately, decodeURI (and friends) consider all ASCII as valid, so they will decode %5c to \, rather than leaving it alone.
Ultimately, the solution is to patch normalize-url so that the result contains %5C instead of \. What a mess! You can view some examples of how things are failing now, and the solution I intend to submit a PR for.
Thanks @ventaur for finding and fixing the issue in the normalize-url library. A new version that contains the fixed has been released.
I think we should ask metascraper to update its dependency (normalize-url), since they are using "normalize-url": "~6.1.0", (released in June 2021).
@ventaur do you want to take the lead here?
I'm happy to take this over the line. After we worked through it some, I was glad to see this change to normalize-url wasn't considered breaking (major). However, the fact that meta scraper is two major versions behind may present some challenges. I'll start digging in.
@eduardopuente so, I ran into a pretty big snag with the metascraper repo. If you look at the breaking changes for normalize-url going from v6 to v8, you can see that in v7, they went pure ESM. That means, you cannot require the module except in some cases with Node v22+. However, metascraper only requires Node v16+. I tried using a top-level async function for the entire metascraper-helpers with no success.
You can read more about the troubles of some packages that move to Pure ESM.
I think we're kind of stuck here until Metascraper updates their entire codebase to ESM.