reader icon indicating copy to clipboard operation
reader copied to clipboard

Jina Reader doesn't work for Reuters.com web site

Open rnavarroz opened this issue 1 year ago • 3 comments

I have tried every possible setting but can't get content of https://r.jina.ai/www.reuters.com/world/trade-wars-erupt-trump-hits-canada-mexico-china-with-steep-tariffs-2025-03-04/

content: 'reuters.com\n' +
        '===============\n' +
        '\n' +
        'Please enable JS and disable any ad blocker'

sometimes I got:

content: 'reuters.com\n==============='

Firecrawl successfully gives the content https://www.firecrawl.dev/playground?url=https%3A%2F%2Fwww.reuters.com%2Fworld%2Ftrade-wars-erupt-trump-hits-canada-mexico-china-with-steep-tariffs-2025-03-04%2F&mode=scrape

rnavarroz avatar Mar 04 '25 17:03 rnavarroz

second this.

how to enable js btw?

imWildCat avatar Mar 05 '25 17:03 imWildCat

Hi @rnavarroz @imWildCat ,

We have been making significant changes to Reader, and now the accessibility issues to reuters.com seem to go away.

That being said, I would like to point out that reuters.com have CAPTCHA deployed and has been intentionally blocking bots like Reader. It might not be sustainable to access reuters.com continuously in large volumes.

nomagick avatar Mar 13 '25 09:03 nomagick

@nomagick thanks for your reply! TBH, I found this project is not as useful as competitors.

Suggestion: maybe it would be much more valuable if you can have a model or use Claude 3.5/7 to bypass the bot check.

I really like your work but unfortunately I cannot use it in production

imWildCat avatar Mar 13 '25 16:03 imWildCat