newspaper
newspaper copied to clipboard
gnews with user agent returns empty text
I encountered some issue with scraping with gnews, these errors are along the lines of
Article
download() failed with 403 Client Error: Max restarts limit reached for url
Article
download() failed with 403 Client Error: Forbidden for url
So i followed https://github.com/johnbumgarner/newspaper3_usage_overview and implemented the user headers, but as soon as i do it, the article.text returns an empty str.
The links are google RSS articles. example "https://news.google.com/rss/articles/CBMifWh0dHBzOi8vc2Vla2luZ2FscGhhLmNvbS9hcnRpY2xlLzE4NDM5MzItdGhlLWV4cGxhbmF0aW9uLWJlaGluZC1hcHBsZXMtZ3Jvc3MtbWFyZ2luLWRlY2xpbmUtYW5kLXdoeS10aGUtZnV0dXJlLWxvb2tzLWJyaWdodGVy0gEA?oc=5&hl=en-SG&gl=SG&ceid=SG:en"
whereas the underlying link "https://seekingalpha.com/article/1843932-the-explanation-behind-apples-gross-margin-decline-and-why-the-future-looks-brighter" works fine.
Thanks for mentioning my usage document in this Issue. What sites give you a 403?