Yanlong Wang

Results 61 comments of Yanlong Wang

Hi @mquandalle. Currently, Reader only works on the HTML tags level. It does not look into the rendered CSS properties of each element. To get a strikethrough in markdown, there...

This has something to do with "[redability](https://github.com/mozilla/readability)" being used in Reader by default. Specify `x-return-format: markdown`. This will prevent readability to "smartly" remove anything.

This is probably because the website you are trying to access has intentionally blocked bots like Reader. To solve this issue, you probably need authorization from the website owner. Technically,...

Hi. This should be our default transformer [@mozilla/readability](https://github.com/mozilla/readability) not smart enough and removing your desired content. Please try the other mode which does not tend to remove things: ```bash curl...

Hi @deathofabat. Reader scrapes the website using a headless Chrome browser, and with a respective Chrome browser UA. You can customize this UA, though, using `x-user-agent` header. In addition to...

Hi @rnavarroz @imWildCat , We have been making significant changes to Reader, and now the accessibility issues to reuters.com seem to go away. That being said, I would like to...

Hi @dudosxdev , Because Reader very much follows the usage pattern of a browser, it would be preferred to add these headers from within the page javascript. Maybe except for...

Hi. For the two pages you mention, Javascript is not the problem. It's the semantic content of the two pages, it's been defined to repeat similar contents in a row....

Hi @ebsawyer, The PDF reading feature of Reader only works for PDFs that have a text layer. Unfortunately, all three PDFs you mentioned, did not contain a text layer and...

This is due to a buggy implementation of the aliyun.com server, an issue in streaming compressed data in this particular case. This can be reproduced by running `curl https://xz.aliyun.com/news/17222 --compressed`....