newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Not working on New York Times

Open JohnChu101 opened this issue 5 years ago • 1 comments

As mentioned in many issues: #645 #363 , newspaper doesn't work on New York times. And I tested two versions of New York times, one is the English version, the second is the Chinese version (https://cn.nytimes.com). The Chinese version doesn't have payment wall, so newspaper should be able to extract the full content of it. However in both cases, newspaper only extract like 3 or 4 paragraphs and they are not from the beginning. Is there any way i can solve this? Thanks.

My code:

from newspaper import Article, Config as NewspaperConfig
url="https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html"
conf = NewspaperConfig()
article = Article(url, config=conf, keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
print(article.text)

The urls i tested with: https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html https://cn.nytimes.com/china/20190821/china-hong-kong-social-media-soft-power/ https://cn.nytimes.com/morning-brief/20190822/hong-kong-protests-british-consulate-us-sanctions-fentanyl/

JohnChu101 avatar Aug 22 '19 03:08 JohnChu101

If it's any help, #885 works with your first URL. With the second URL, the last sentence is missing and with the third URL I think a few more sentences are missing. I can't read the Chinese version to fully determine what sentences are missing here and there but the linked PR captures more than the master branch - hope it helps!

jecarr avatar May 10 '21 06:05 jecarr