newspaper
newspaper copied to clipboard
Not working on New York Times
As mentioned in many issues: #645 #363 , newspaper doesn't work on New York times. And I tested two versions of New York times, one is the English version, the second is the Chinese version (https://cn.nytimes.com). The Chinese version doesn't have payment wall, so newspaper should be able to extract the full content of it. However in both cases, newspaper only extract like 3 or 4 paragraphs and they are not from the beginning. Is there any way i can solve this? Thanks.
My code:
from newspaper import Article, Config as NewspaperConfig
url="https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html"
conf = NewspaperConfig()
article = Article(url, config=conf, keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
print(article.text)
The urls i tested with: https://www.nytimes.com/2019/08/21/business/economy/jobs-growth-revision.html https://cn.nytimes.com/china/20190821/china-hong-kong-social-media-soft-power/ https://cn.nytimes.com/morning-brief/20190822/hong-kong-protests-british-consulate-us-sanctions-fentanyl/
If it's any help, #885 works with your first URL. With the second URL, the last sentence is missing and with the third URL I think a few more sentences are missing. I can't read the Chinese version to fully determine what sentences are missing here and there but the linked PR captures more than the master branch - hope it helps!