newspaper4k
newspaper4k copied to clipboard
[SITES] https://www.bizjournals.com
First please check that it is really an issue with the library, and not some special case of website:
- [ X ] There is no paywall
- [ X ] You do not have to be logged in to see the articles
- [ X ] You tried using a common browser user agent in your configuration / call
- [ X ] The website is not in the list of well known problematic sites
Your report as follows:
Website that does not parse correctly:
https://www.bizjournals.com
Some sample urls that I have tried
https://www.bizjournals.com/boston/news/2024/08/23/irobot-roomba-cleaning-station.html?ana=brss_4650 https://www.bizjournals.com/sanfrancisco/inno/stories/news/2024/08/22/bracing-for-impact-bay-area-investors-bullish-dei.html?ana=brss_4650
The exact code i used to test this articles/website
article = Article(url, fetch_images=False, follow_meta_refresh=True)
article.download()
article.parse()
Other information, remarks, messages, etc:
newspaper.exceptions.ArticleException: Article download()
failed with Status code 403 for url None