newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

gnews with user agent returns empty text

Open wj210 opened this issue 8 months ago • 1 comments

I encountered some issue with scraping with gnews, these errors are along the lines of Article download() failed with 403 Client Error: Max restarts limit reached for url Article download() failed with 403 Client Error: Forbidden for url

So i followed https://github.com/johnbumgarner/newspaper3_usage_overview and implemented the user headers, but as soon as i do it, the article.text returns an empty str.

The links are google RSS articles. example "https://news.google.com/rss/articles/CBMifWh0dHBzOi8vc2Vla2luZ2FscGhhLmNvbS9hcnRpY2xlLzE4NDM5MzItdGhlLWV4cGxhbmF0aW9uLWJlaGluZC1hcHBsZXMtZ3Jvc3MtbWFyZ2luLWRlY2xpbmUtYW5kLXdoeS10aGUtZnV0dXJlLWxvb2tzLWJyaWdodGVy0gEA?oc=5&hl=en-SG&gl=SG&ceid=SG:en"

whereas the underlying link "https://seekingalpha.com/article/1843932-the-explanation-behind-apples-gross-margin-decline-and-why-the-future-looks-brighter" works fine.

wj210 avatar Oct 18 '23 03:10 wj210

Thanks for mentioning my usage document in this Issue. What sites give you a 403?

johnbumgarner avatar Oct 30 '23 18:10 johnbumgarner