jetzt icon indicating copy to clipboard operation
jetzt copied to clipboard

Default text selection on popular sites

Open georgjaehnig opened this issue 10 years ago • 15 comments

When there is no text selected, the extension could select a default text based on the current domain and a given XPath set in the extension. For instance on http://www.bbc.com/news/world-europe-26465962, the news text could be selected automatically. And on Wikipedia, the article text.

georgjaehnig avatar Mar 07 '14 02:03 georgjaehnig

Readability.com might be a sensible method for striping the content out. This is how OpenSpritz does it.

dpash avatar Mar 07 '14 02:03 dpash

readability.com would be nice, but I don't like the idea of waiting on external API calls, and I've noticed OpenSpritz can sometimes take several seconds to load. After all, 'jetzt' means 'now' in german.

I propose the following:

  • a function which takes a dom node and compiles its content to jetzt instructions, similarly to how I've done it for plain strings.
  • a function for best-guessing the parent node of an article by, e.g. finding the node with the most <p> children.
  • a map from url patterns to xpath/css selector/node extraction functions, for popular sites where the best-guess doesn't work well enough.

Thoughts?

ds300 avatar Mar 07 '14 09:03 ds300

Or when the user starts jetz without any text selected, the element the mouse is hoovering can light up and if the user clicks again that text will be read (like the inspect element function).

Gyran avatar Mar 07 '14 10:03 Gyran

Great idea Gyran!

Re; using an external service for this: one reason i prefer Jetzt over OpenSpritz is that Jetzt can read local documents. Due to how OpenSpritz works it can only read publicly available documents/pages.

rtuin avatar Mar 07 '14 14:03 rtuin

:+1:

Anahkiasen avatar Mar 07 '14 15:03 Anahkiasen

I like the current solution pretty much that allows the reader to select a text block with the mouse.

However, there's a little bug: It seems, that HTML comments are included in this automatic selection. See http://www.spiegel.de/politik/deutschland/krim-krise-ex-kanzler-gerhard-schroeder-kritisiert-eu-a-957728.html as an example. When selecting the whole article text, after jetzting the first paragraph, an HTML comment is jetzted.

(But BTW: Really great work, this extension!!)

georgjaehnig avatar Mar 09 '14 16:03 georgjaehnig

Oh man that's annoying. Thanks for pointing it out though. Just further motivation to get a proper dom parser on the go :)

ds300 avatar Mar 09 '14 16:03 ds300

@georgjaehnig the comments that appear in that page example are inside script tags, so this seems to be the same issue as #29 - I confirmed that this works with the PR in #31

h0ru5 avatar Mar 09 '14 20:03 h0ru5

I appreciate that the readability API can take several seconds, and also I noticed on OpenSpritz it doesn't even work at all on Guardian articles. That said, it would be an excellent additional feature. The default alt-s behaviour can be slightly clunky depending on the underlying HTML structure and also might pick up images and their captions in the middle which might not make sense. It's also an annoying extra step which in my opinion adds little value over simply selecting the text manually. When you are on a website and press alt-r, it could query the Readability API and at least take a best guess.

ecsplendid avatar Mar 10 '14 19:03 ecsplendid

I think this could be a way to go: https://github.com/fb55/readabilitySAX fb55 offers a readability port that can be used inside the browser

h0ru5 avatar Mar 10 '14 19:03 h0ru5

^ h0ru5 has the best idea :>

ecsplendid avatar Mar 10 '14 20:03 ecsplendid

I agree with h0ru5. Other reasons: privacy and offline usage and probably speed. Here are some other js based readabilty scripts I posted earlier: https://github.com/ds300/jetzt/pull/26#issuecomment-37131986

peteruithoven avatar Mar 10 '14 21:03 peteruithoven

@peteruithoven I skimmed through the list, but I found the best starting point in the SAX-based one. I think https://github.com/BaNzounet/readability/blob/master/src/helpers.js might be possible to use as well, they seperated the node stuff from a helpers.js in pure js

h0ru5 avatar Mar 10 '14 22:03 h0ru5

Another reason not to use a external api like readability is that you can't use it on services that require a login, like mail.

peteruithoven avatar Mar 12 '14 03:03 peteruithoven

So it has been quite a while, but this might be an interesting development: https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/ https://mozilla.github.io/fathom/

peteruithoven avatar May 05 '17 14:05 peteruithoven