linkding
linkding copied to clipboard
Move scraping of site title and site description to the client (bookmarklet and browser extension)
Currently, when adding a bookmark through the firefox browser addon the addon will populate certain fields automatically. I could be wrong here but it seems like the actual scraping of that data, when the bookmark gets added to linkding, is handled by the server.
Most of the time this is not an issue, but with certain sites like those protected by cloudflare, this leads to unexpected behavior as illustrated here;
Adding a bookmark for a Path of Exile forum post via the browser extension:
Notice the pre-populated title field, as expected

How that bookmark then appears in linkding:

For now I would say that this is by design. The scraping happens on the server because:
- it can be reused by the internal bookmark form, by the extension, as well as other tools using the REST API
- fetching a website using AJAX methods from the browser would likely lead to cross origin issues. While the extension might be able to circumvent CORS checks, the internal bookmark form would definitely not
While it's unfortunate that some sites block request coming from servers, I would prefer to keep things simple and keep the logic in one place rather than implement this logic multiple times in different places / languages.
An alternative I can think of is to extend the extension to:
- provide a setting to always set an explicit title + description and get these from the current tab
- ATM the extension only reads the tab title, so it would also need to be extended determine a description from the document
Changed the title to include the bookmarklet into this issue. See https://github.com/sissbruecker/linkding/issues/292 for the original request. As mentioned there, if the website metadata is provided by a client, then scraping on the server could be skipped.
I'm more open to this now, as there are bug reports around this from time to time. Ideally the client should provide both the website title and description. Getting the title is straightforward, however the description is not. There are websites (GitHub, Reddit) that do not update the website's meta description tag while navigating through the page, which means the description provided by the client might not be correct. Kind of hard to say which method (client or server scraping) would provide better results on average.
For now I assume server-side scraping is still be better alternative, if someone has ideas around the description issue, feel free to share.
Regarding the description issue, I would love if any currently-selected text on a page would be used as the description when invoking the bookmarklet/extension (the current behavior would be kept if no text is selected.)
Barring that, it would be nice to at least have the ability to manually provide a description parameter to /new in order to homebrew the functionality described above by customizing the bookmarklet on my own, using it in Apple Shortcuts, etc.