newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Results 152 newspaper issues
Sort by recently updated
recently updated
newest added

Hi,how do I set the cache directory to the current project root path

The original caption removal code does not remove a lot of image captions which ultimately become a part of the article text and mess it up. Have added a fix...

Hi there, this seems to be a great library, however I does not work for the following test site: https://www.forbes.com/sites/tonybradley/2019/01/28/cybersecurity-experts-share-insight-for-data-privacy-day-2019/

``` >>> from newspaper import Article >>> url = 'https://appleinsider.ru/iphone/skolko-stoyat-komponenty-iphone-13-pro-spojler-eto-ne-sebestoimost.html' >>> article = Article(url) >>> article.download() >>> article.parse() >>> article.authors ['Дизайн', 'Миша Гончаров', 'Воплощение'] >>> article.publish_date ``` In fact, one...

On fetching the article content from Article.text; only a few of the initial paragraphs get fetched sometimes. It gets appended with "Read More" at the end. In some cases, even...

Before this change, `top_node` was cleaned and then copied to the `clean_top_node`. I believe this is not the original intent and should be fixed because it creates confusion and there's...

I'm not sure if it's an issue with the HTML of the website, if there's an issue parsing Tajiki, or something else, but I tried scraping http://www.jumhuriyat.tj/index.php?art_id=44635 on the Heroku...

The library was failing to scrape sites which have javascript code in it so i have added the ability to scrape such websites. So now it will be possible to...

Tried this link on local with newspaper3k **link**: http://www.news.com.au/sport/cricket/big-bash/bbl-2019-perth-scorchers-vs-melbourne-renegades-at-optus-stadium/live-coverage/c76e315c694d39dd5c20ad75c5a136aa **my code** : ``` article_content = Article('http://www.news.com.au/sport/cricket/big-bash/bbl-2019-perth-scorchers-vs-melbourne-renegades-at-optus-stadium/live-coverage/c76e315c694d39dd5c20ad75c5a136aa ', keep_article_html=True) article_content.download() -> this throws error article_content.parse() ``` However, I earlier was working...