Ghost icon indicating copy to clipboard operation
Ghost copied to clipboard

Bookmarker thumbnail breaks on some links

Open eduardopuente opened this issue 1 year ago • 5 comments

Issue Summary

When adding a bookmark, the thumbnail is missing on some URLs. Recreating the bookmark does not help.

image

Does not work: https://www.komoot.com/es-es/tour/907703588?share_token=aGz0MeTiQ4l20kR1I3fZZ8jqo30FhUDfOaVBzw0HL1x2pqqqlK&ref=

Works: https://www.komoot.com/es-es/tour/907708579?share_token=a0l1fiZBYZHhKJr8IRogCPRNxTATckU1RhbHIRxoN4QtdgPzZn&ref=

Steps to Reproduce

  1. Paste link in post

Ghost Version

5.86.2

Node.js Version

18.20.3

How did you install Ghost?

Docker

Database type

MySQL 8

Browser & OS version

Chrome

Relevant log / error output

No response

Code of Conduct

  • [X] I agree to be friendly and polite to people in this repository

eduardopuente avatar Jun 29 '24 11:06 eduardopuente

Hi @eduardopuente , There seems to be an issue with the image used in the bookmarker thumbnail. Could you please confirm if the image is still available? The server is returning a 404 Not Found error.

Below are some of the screenshots :

Actual Image URL used : Screenshot 2024-10-06 at 9 34 52 PM

Ghost was unable to load resource as the server responded with 404 (Not Found) : Screenshot 2024-10-06 at 9 34 22 PM

@cmraible , I’m new to Ghost, so please correct me if I’m wrong.

sanketmagar2001 avatar Oct 06 '24 16:10 sanketmagar2001

The bookmarking functionality relies on reading metadata from 3rd party websites, and then relies on that to be correct and point to valid images. There are lots of tiny ways that this can go wrong.

Happy to accept PRs which help make this process more robust!

ErisDS avatar Mar 17 '25 11:03 ErisDS

Hi folks! I dug deeply into this one and came up with the reason this is happening, but no simple solution. I'll submit an issue with the library in question and a PR to fix it, but I'm not sure the library author will accept it. We'll see.

TL;DR: The problem is not in Ghost or its related repositories. Once I work out the issue with the upstream repo's maintainer, we would need to wait (or encourage) Ghost's dependency (metascraper) to update its dependency (normalize-url), then perhaps we can update metascraper in Ghost Core.


If you visit the example page provided by the OP that doesn't show a thumbnail, you can find a meta tag in the HTML source for og:image with this URL:

https://tourpic-vector.maps.komoot.net/r/big/u%60t%5Be_pC~EzKnI%60p@h@nf@xFzJeB~EdQdXbH%60e@pHvOsAxp@lO%60VnA%7CVvyAhnAbYff@~P~ItT%5CjNjm@%7CQtU/?width=768&height=576&crop=true

See that %5C near the end of the path? That's a backslash (\) if you decode it. That's where the fun begins. If you visit the above URL in your browser, the image will display just fine. If you replace the %5C with \, you should see your browser flip the backslash to a forward slash (/).

https://tourpic-vector.maps.komoot.net/r/big/u%60t%5Be_pC~EzKnI%60p@h@nf@xFzJeB~EdQdXbH%60e@pHvOsAxp@lO%60VnA%7CVvyAhnAbYff@~P~ItT\jNjm@%7CQtU/?width=768&height=576&crop=true

If you click the above URL, it looks like GitHub is encoding the \ back to %5C. So, go ahead and manually try to change the %5C to \ in your address bar, and you should see the request fail with a 404, because the browser was trying to be helpful and replaced your \ with /. However, that's the wrong URL, though.

Ghost uses the metascraper library. metascraper is made of tiny packages, and one of those is metascraper-image, which parses the thumbnail image intended to represent a web page (via oembed metadata). The image package uses metascraper-helpers. Buried in the image parsing rules is code to sanitize and normalize the image URL. The normalize-url library is used for this, and it decodes the URL's path name and then assigns it back to its working URL object.

Since a backslash (\) is invalid in a URL, v8 assumes you meant a forward slash (/) when it sees it in the path, so it flips it for you. How nice! The percent-encoded equivalent to a backslash is %5C, and that is valid in the URL. Unfortunately, decodeURI (and friends) consider all ASCII as valid, so they will decode %5c to \, rather than leaving it alone.

Ultimately, the solution is to patch normalize-url so that the result contains %5C instead of \. What a mess! You can view some examples of how things are failing now, and the solution I intend to submit a PR for.

ventaur avatar May 27 '25 02:05 ventaur

Thanks @ventaur for finding and fixing the issue in the normalize-url library. A new version that contains the fixed has been released.

I think we should ask metascraper to update its dependency (normalize-url), since they are using "normalize-url": "~6.1.0", (released in June 2021).

@ventaur do you want to take the lead here?

eduardopuente avatar Jun 13 '25 13:06 eduardopuente

I'm happy to take this over the line. After we worked through it some, I was glad to see this change to normalize-url wasn't considered breaking (major). However, the fact that meta scraper is two major versions behind may present some challenges. I'll start digging in.

ventaur avatar Jun 13 '25 20:06 ventaur

@eduardopuente so, I ran into a pretty big snag with the metascraper repo. If you look at the breaking changes for normalize-url going from v6 to v8, you can see that in v7, they went pure ESM. That means, you cannot require the module except in some cases with Node v22+. However, metascraper only requires Node v16+. I tried using a top-level async function for the entire metascraper-helpers with no success.

You can read more about the troubles of some packages that move to Pure ESM.

I think we're kind of stuck here until Metascraper updates their entire codebase to ESM.

ventaur avatar Jun 20 '25 00:06 ventaur