obsidian-raindrop-highlights-plugin icon indicating copy to clipboard operation
obsidian-raindrop-highlights-plugin copied to clipboard

Include full page text, with inline highlighting

Open robertandrews opened this issue 2 years ago • 5 comments

I believe Raindrop's API has the capability to return full text of source pages. This is for paying customers, which I am not currently.

This would allow an option to return full page text into Obsidian...

And that might allow an optional alternative method of presenting highlights...

Rather than return only highlights, in sequence, you could wrap the corresponding text portions from the full text in the Markdown highlight syntax (==).

IMG_0495

This would allow the user to a) see full page text, b) see highlights in their full context and c) align the Obsidian-side experience of raindrops with the experience inside Raindrop itself.

I don't know how Raindrop stores/returns the full page text, ie HTML? If so, it may require conversion to Markdown.

May need to function on raindrop type "article" only.

robertandrews avatar Oct 15 '22 07:10 robertandrews

This would be awesome to have. Sometimes articles can be full of information you want to keep, and highlighting everything is obviously not a good idea, and on top of that, images are a problem saving them as highlights.

If only we could use raindrop (and/or this plugin) to keep the whole article offline.

the-c0d3r avatar Oct 31 '22 13:10 the-c0d3r

I believe Raindrop's API has the capability to return full text of source pages.

No, it's currently not available. Raindrop API provides an endpoint to a permanent copy, but it only redirects to the saved page and not return the full text of source pages.

There are some challenges to implementing this feature in this plugin:

  1. Raindrop doesn't provide full text API. This means the plugin needs to implement a web crawler. The simplest pipeline is request -> parsing -> readability conversion -> convert to markdown.
  2. Raindrop doesn't provide the location information for each highlight. The plugin needs an efficient algorithm to find the appropriate location to insert == symbol. This is not trivial due to the readability conversion may corrupt the web content, causing the highlighted text from Raindrop API doesn't match with the converted web content.
  3. Syncing the new highlight requires overwriting the whole markdown content, meaning that the new highlight added from Obsidian will be overwritten. (BTW. Two-way syncing is merely impossible given the high flexibility of editing.)

I think the most stable implementation is to "archive" the full web page to the Obsidian vault (maybe through obsidian web clipper) and directly highlight and link the content in Obsidian.

I'm not planning to implement this feature at this moment unless Raindrop provides a more stable solution.

kaiiiz avatar Oct 31 '22 14:10 kaiiiz

No, it's currently not available. Raindrop API provides an endpoint to a permanent copy, but it only redirects to the saved page and not return the full text of source pages.

Hmm... so the "permanent copy" is not stored in the Raindrop database, it is a copy of the HTML page stored on some server?

  • The /cache/ endpoint doc you point to includes an example link of "Location: https://s3.aws..."

  • Viewing source in my web browser shows one permanent copy of a Mac Observer article living at https://preview.systems/web/longstringhere=

Either way, I see why more work would need to be done to pull the article.

It's a shame since UI views for raindrops include both "Web" and "Preview", the latter of which is actually the stripped-down readability/plaintext version of the page, but with highlights showing. So Raindrop is doing that at some point; it's a shame if it's not being surfaced.

Thanks for looking.

robertandrews avatar Oct 31 '22 14:10 robertandrews

I've just taken out a Raindrop Pro subscription to test.

Re: the plaintext article version I talked about. I've done some poking around...

  • Permanent copy: Yes, this stores a copy at a URL like https://s3.eu-central-1.wasabisys.com/cache.raindrop.io/461/231/211/file?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=stringhere%2F20221031%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20221031T163250Z&X-Amz-Expires=300&X-Amz-Signature=stringhere&X-Amz-SignedHeaders=host. It's available publicly without authentication. This does not show highlights, they are only surfaced when the page is viewed through the (web) app. These are the pages that would involve scraping a large amount of layout variants, performing readability and then matching highlights back up, as you mention. Too onerous.
  • Preview mode: Looks like the (web) app generates URLs like https://preview.systems/article/longstring (actually, https://preview.systems/article/longstring#solid-bg=false&theme=day&font-family=&font-size=1). I've confirmed these URLs are also public, they show the radically stripped-down article contents and there doesn't seem to be an "expire" check - but they also don't show the highlights...

For me, the https://api.raindrop.io/rest/v1/raindrop/{id}/cache endpoint is failing with error 200 and no body content, so I can't see what's behind it, but I presume it's the first of these two URLs.

Screenshot 2022-10-31 at 16 41 55

Source of the Preview mode page reveals...

  • a) a link to the original article URL
  • b) three included files...
  1. https://preview.systems/article/app.css?v1.0.50
  2. https://preview.systems/article/safari.js?v1.0.50
  3. https://preview.systems/article/app.js?v1.0.50

Number 2 is Readability. Number 3 seems to both include DOMPurify, which cleans up HTML, and then all the Javascript for using it. There is stuff in here about mark positioning.

Looks like Raindrop is already doing the work and using those to render a stripped-down version.

Back inside the app, the URL/s is/are rendering highlighted text with <mark>. It includes the highlight's own ID (the same one returned in the "highlights" node of the response from the "raindrop" endpoint) as a data attribute...

<p>I think <mark data-rdhid="635422f95002697a01898a7e" title="Great writing here.">the devil is real and he wants you to be more productive<svg class="rdhni" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10"> <path d="M8 0a2 2 0 0 1 2 2v8L6 8H2a2 2 0 0 1-2-2V2C0 .9.9 0 2 0h6ZM2 3a1 1 0 1 0 0 2 1 1 0 0 0 0-2Zm3 0a1 1 0 1 0 0 2 1 1 0 0 0 0-2Zm3 0a1 1 0 1 0 0 2 1 1 0 0 0 0-2Z"></path> </svg></mark>. He’s everywhere, spreading wickedness disguised as wisdom. Here he is in&nbsp;<em><a href="https://www.forbes.com/sites/ilyapozin/2013/08/14/9-habits-of-productive-people/?sh=4bb8f2b22d3f">Forbes</a>:</em></p>

There, title="Great writing here." denotes an annotation I added to this highlight.

Whilst the readability-generated URL is infinitely more uniform and scrapable, it does not seem to include the highlights.

However, it may be useful to note you, in these files, you can see a lot of the logic for how Raindrop uses Readability etc to do stripping-back and matching.

robertandrews avatar Oct 31 '22 16:10 robertandrews

The omnivore obsidian plugin offers this! Thinking of switching from raindrop to omnivore for several reasons, but it seems it needs to mature a bit more first

iroQuai avatar May 21 '23 05:05 iroQuai