[SITES] https://www.bizjournals.com

Open tbrox opened this issue 6 months ago • 0 comments

First please check that it is really an issue with the library, and not some special case of website:

[ X ] There is no paywall
[ X ] You do not have to be logged in to see the articles
[ X ] You tried using a common browser user agent in your configuration / call
[ X ] The website is not in the list of well known problematic sites

Your report as follows:

Website that does not parse correctly:

https://www.bizjournals.com

Some sample urls that I have tried

https://www.bizjournals.com/boston/news/2024/08/23/irobot-roomba-cleaning-station.html?ana=brss_4650 https://www.bizjournals.com/sanfrancisco/inno/stories/news/2024/08/22/bracing-for-impact-bay-area-investors-bullish-dei.html?ana=brss_4650

The exact code i used to test this articles/website


article = Article(url, fetch_images=False, follow_meta_refresh=True)
article.download()
article.parse()

Other information, remarks, messages, etc:

newspaper.exceptions.ArticleException: Article download() failed with Status code 403 for url None

Aug 23 '24 10:08 tbrox

newspaper4k newspaper4k copied to clipboard

[SITES] https://www.bizjournals.com

First please check that it is really an issue with the library, and not some special case of website:

Your report as follows:

newspaper4k
newspaper4k copied to clipboard