jetzt Default text selection on popular sites

When there is no text selected, the extension could select a default text based on the current domain and a given XPath set in the extension. For instance on http://www.bbc.com/news/world-europe-26465962, the news text could be selected automatically. And on Wikipedia, the article text.

Mar 07 '14 02:03 georgjaehnig

Readability.com might be a sensible method for striping the content out. This is how OpenSpritz does it.

Mar 07 '14 02:03 dpash

readability.com would be nice, but I don't like the idea of waiting on external API calls, and I've noticed OpenSpritz can sometimes take several seconds to load. After all, 'jetzt' means 'now' in german.

I propose the following:

a function which takes a dom node and compiles its content to jetzt instructions, similarly to how I've done it for plain strings.
a function for best-guessing the parent node of an article by, e.g. finding the node with the most <p> children.
a map from url patterns to xpath/css selector/node extraction functions, for popular sites where the best-guess doesn't work well enough.

Thoughts?

Mar 07 '14 09:03 ds300

Or when the user starts jetz without any text selected, the element the mouse is hoovering can light up and if the user clicks again that text will be read (like the inspect element function).

Mar 07 '14 10:03 Gyran

Great idea Gyran!

Re; using an external service for this: one reason i prefer Jetzt over OpenSpritz is that Jetzt can read local documents. Due to how OpenSpritz works it can only read publicly available documents/pages.

Mar 07 '14 14:03 rtuin

:+1:

Mar 07 '14 15:03 Anahkiasen

I like the current solution pretty much that allows the reader to select a text block with the mouse.

However, there's a little bug: It seems, that HTML comments are included in this automatic selection. See http://www.spiegel.de/politik/deutschland/krim-krise-ex-kanzler-gerhard-schroeder-kritisiert-eu-a-957728.html as an example. When selecting the whole article text, after jetzting the first paragraph, an HTML comment is jetzted.

(But BTW: Really great work, this extension!!)

Mar 09 '14 16:03 georgjaehnig

Oh man that's annoying. Thanks for pointing it out though. Just further motivation to get a proper dom parser on the go :)

Mar 09 '14 16:03 ds300

@georgjaehnig the comments that appear in that page example are inside script tags, so this seems to be the same issue as #29 - I confirmed that this works with the PR in #31

Mar 09 '14 20:03 h0ru5

I appreciate that the readability API can take several seconds, and also I noticed on OpenSpritz it doesn't even work at all on Guardian articles. That said, it would be an excellent additional feature. The default alt-s behaviour can be slightly clunky depending on the underlying HTML structure and also might pick up images and their captions in the middle which might not make sense. It's also an annoying extra step which in my opinion adds little value over simply selecting the text manually. When you are on a website and press alt-r, it could query the Readability API and at least take a best guess.

Mar 10 '14 19:03 ecsplendid

I think this could be a way to go: https://github.com/fb55/readabilitySAX fb55 offers a readability port that can be used inside the browser

Mar 10 '14 19:03 h0ru5

^ h0ru5 has the best idea :>

Mar 10 '14 20:03 ecsplendid

I agree with h0ru5. Other reasons: privacy and offline usage and probably speed. Here are some other js based readabilty scripts I posted earlier: https://github.com/ds300/jetzt/pull/26#issuecomment-37131986

Mar 10 '14 21:03 peteruithoven

@peteruithoven I skimmed through the list, but I found the best starting point in the SAX-based one. I think https://github.com/BaNzounet/readability/blob/master/src/helpers.js might be possible to use as well, they seperated the node stuff from a helpers.js in pure js

Mar 10 '14 22:03 h0ru5

Another reason not to use a external api like readability is that you can't use it on services that require a login, like mail.

Mar 12 '14 03:03 peteruithoven

So it has been quite a while, but this might be an interesting development: https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/ https://mozilla.github.io/fathom/

May 05 '17 14:05 peteruithoven

jetzt jetzt copied to clipboard

Default text selection on popular sites

jetzt
jetzt copied to clipboard