Jina Reader doesn't work for Reuters.com web site
I have tried every possible setting but can't get content of https://r.jina.ai/www.reuters.com/world/trade-wars-erupt-trump-hits-canada-mexico-china-with-steep-tariffs-2025-03-04/
content: 'reuters.com\n' +
'===============\n' +
'\n' +
'Please enable JS and disable any ad blocker'
sometimes I got:
content: 'reuters.com\n==============='
Firecrawl successfully gives the content https://www.firecrawl.dev/playground?url=https%3A%2F%2Fwww.reuters.com%2Fworld%2Ftrade-wars-erupt-trump-hits-canada-mexico-china-with-steep-tariffs-2025-03-04%2F&mode=scrape
second this.
how to enable js btw?
Hi @rnavarroz @imWildCat ,
We have been making significant changes to Reader, and now the accessibility issues to reuters.com seem to go away.
That being said, I would like to point out that reuters.com have CAPTCHA deployed and has been intentionally blocking bots like Reader. It might not be sustainable to access reuters.com continuously in large volumes.
@nomagick thanks for your reply! TBH, I found this project is not as useful as competitors.
Suggestion: maybe it would be much more valuable if you can have a model or use Claude 3.5/7 to bypass the bot check.
I really like your work but unfortunately I cannot use it in production