Yanlong Wang comments

Results 61 comments of


                                            Yanlong Wang

Strikethrough text not converted to Markdown

Hi @mquandalle. Currently, Reader only works on the HTML tags level. It does not look into the rendered CSS properties of each element. To get a strikethrough in markdown, there...

Code block extraction fails when using selectors on replicate.com docs

This has something to do with "[redability](https://github.com/mozilla/readability)" being used in Reader by default. Specify `x-return-format: markdown`. This will prevent readability to "smartly" remove anything.

How to deal with cookie requests on sites and security

This is probably because the website you are trying to access has intentionally blocked bots like Reader. To solve this issue, you probably need authorization from the website owner. Technically,...

Incomplete Markdown Conversion: Missing MSRP Cap from URL Content

Hi. This should be our default transformer [@mozilla/readability](https://github.com/mozilla/readability) not smart enough and removing your desired content. Please try the other mode which does not tend to remove things: ```bash curl...

Does Jina.ai scrape the websites anonymously or non-anonymously?

Hi @deathofabat. Reader scrapes the website using a headless Chrome browser, and with a respective Chrome browser UA. You can customize this UA, though, using `x-user-agent` header. In addition to...

Jina Reader doesn't work for Reuters.com web site

Hi @rnavarroz @imWildCat , We have been making significant changes to Reader, and now the accessibility issues to reuters.com seem to go away. That being said, I would like to...

Feature Request: Add x-forward-header-* support for custom header forwarding

Hi @dudosxdev , Because Reader very much follows the usage pattern of a browser, it would be preferred to add these headers from within the page javascript. Maybe except for...

Unable to crawl heavy Javascript based website

Hi. For the two pages you mention, Javascript is not the problem. It's the semantic content of the two pages, it's been defined to repeat similar contents in a row....

Reader can't read certain pdf links

Hi @ebsawyer, The PDF reading feature of Reader only works for PDFs that have a text layer. Unfortunately, all three PDFs you mentioned, did not contain a text layer and...

Failed writing received data to disk/application

This is due to a buggy implementation of the aliyun.com server, an issue in streaming compressed data in this particular case. This can be reproduced by running `curl https://xz.aliyun.com/news/17222 --compressed`....