jetzt
jetzt copied to clipboard
Default text selection on popular sites
When there is no text selected, the extension could select a default text based on the current domain and a given XPath set in the extension. For instance on http://www.bbc.com/news/world-europe-26465962, the news text could be selected automatically. And on Wikipedia, the article text.
Readability.com might be a sensible method for striping the content out. This is how OpenSpritz does it.
readability.com would be nice, but I don't like the idea of waiting on external API calls, and I've noticed OpenSpritz can sometimes take several seconds to load. After all, 'jetzt' means 'now' in german.
I propose the following:
- a function which takes a dom node and compiles its content to jetzt instructions, similarly to how I've done it for plain strings.
- a function for best-guessing the parent node of an article by, e.g. finding the node with the most
<p>
children. - a map from url patterns to xpath/css selector/node extraction functions, for popular sites where the best-guess doesn't work well enough.
Thoughts?
Or when the user starts jetz without any text selected, the element the mouse is hoovering can light up and if the user clicks again that text will be read (like the inspect element function).
Great idea Gyran!
Re; using an external service for this: one reason i prefer Jetzt over OpenSpritz is that Jetzt can read local documents. Due to how OpenSpritz works it can only read publicly available documents/pages.
:+1:
I like the current solution pretty much that allows the reader to select a text block with the mouse.
However, there's a little bug: It seems, that HTML comments are included in this automatic selection. See http://www.spiegel.de/politik/deutschland/krim-krise-ex-kanzler-gerhard-schroeder-kritisiert-eu-a-957728.html as an example. When selecting the whole article text, after jetzting the first paragraph, an HTML comment is jetzted.
(But BTW: Really great work, this extension!!)
Oh man that's annoying. Thanks for pointing it out though. Just further motivation to get a proper dom parser on the go :)
@georgjaehnig the comments that appear in that page example are inside script tags, so this seems to be the same issue as #29 - I confirmed that this works with the PR in #31
I appreciate that the readability API can take several seconds, and also I noticed on OpenSpritz it doesn't even work at all on Guardian articles. That said, it would be an excellent additional feature. The default alt-s behaviour can be slightly clunky depending on the underlying HTML structure and also might pick up images and their captions in the middle which might not make sense. It's also an annoying extra step which in my opinion adds little value over simply selecting the text manually. When you are on a website and press alt-r, it could query the Readability API and at least take a best guess.
I think this could be a way to go: https://github.com/fb55/readabilitySAX fb55 offers a readability port that can be used inside the browser
^ h0ru5 has the best idea :>
I agree with h0ru5. Other reasons: privacy and offline usage and probably speed. Here are some other js based readabilty scripts I posted earlier: https://github.com/ds300/jetzt/pull/26#issuecomment-37131986
@peteruithoven I skimmed through the list, but I found the best starting point in the SAX-based one. I think https://github.com/BaNzounet/readability/blob/master/src/helpers.js might be possible to use as well, they seperated the node stuff from a helpers.js in pure js
Another reason not to use a external api like readability is that you can't use it on services that require a login, like mail.
So it has been quite a while, but this might be an interesting development: https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/ https://mozilla.github.io/fathom/