Harold Treen

Results 83 comments of Harold Treen

@mhamann I've been having the same issues with some sites. My approach was to wrap `readability` with a Cheerio based pre-processor. Some things that I do with that pre-processor: -...

Glad that helps @mhamann :) These are the rules I supply: ![image](https://cloud.githubusercontent.com/assets/1745854/16827293/cfa64dd4-494a-11e6-8a2e-dc32e4c7db2a.png) Unfortunately it's all in a private repo at the moment, but I'll be trying to transfer more fixes...

I also have a much more comprehensive set of regression tests. Would almost be good to open source the test suite so that content extractors can be compared. There's been...

I'm working on a project that requires really good content extraction (https://epub.press), so that's how its come about :). I've been accepted into the Recurse Center in September and will...

I've open sourced the preprocessor I use on [EpubPress](https://epub.press). You can find it here: https://github.com/haroldtreen/epub-press/blob/master/lib/content-extractor.js#L28 I find it works really well for making sites behave with `readability`. Hope that helps!

This is more of a legal issue then a software issue and I'm doubtful many lawyers are watching this repo. That being said, many services let you save articles to...

Who can sleep when there may be unused variables in the wild! Testing on a generated project sounds good! I shall do that. As for side effect imports, Shawn's suggestion...

Interesting! This makes sense. Some context as to what's going on: 1. Books are deleted after ~5 minutes. Previously they just remained indefinitely, but that resulted in hard drives overflowing...

This is also happening for titles that use french quotes: ``` However, I would like to point out a small persistent problem : in the created epub, the title of...

Hey @seefood ! Thanks so much for this well documented issue. I'll sniff around to see if there's any easy way to support this. I agree that supporting RTL languages...