newspaper issues

Results 152 newspaper issues

Sort by recently updated

Parse only a certain directory

Hi! I started using newspaper library and ran into a problem. For example, I want to parse a certain category of a website. I would try to do it like...

savandriy

Categories filters don't work as expected

Hi team, I've been trying to extract articles for a particular category https://www.dailymail.co.uk/health/index.html, however it when I check the articles being fetched, I get everything, not just the ones under...

varuncheq

Receving same result for different urls

I need to retrive all news from https://news.bitcoin.com. It has pages like this: https://news.bitcoin.com/page/2/. But whatever page I am trying to access I get same results like this: https://news.bitcoin.com/the-satoshi-revolution-by-wendy-mcelroy/ https://news.bitcoin.com/tidbits-peter-todd-on-passphrase-memorization-antonopoulos-explains-transaction-fees/...

Tolkoton

Iterating articles on news source produces duplicates, if subdomain omitted.

I was testing news sources, and found that this article was emitted twice, despite the fact that newspaper should be memoizing. The problem seems to be that memoization uses the...

awiebe

Added Marathi Language Extension

tanmay-punekar

Newsweek articles do not download

How can I help?

planktonrobo

Maintenance Status?

Is this library still being maintained? Thanks

nmcbride

Parsing Incorrectly for Yahoo Finance

for article links like | url | action required to view all | | --- | --- | | https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html | Read More button | | https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html | | |...

jtara1

Fix concatenating sentence parts separated with newlines

The text content of newspapers seems to be returned as paragraphs separated by two newlines. When doing nlp on this, the tokenizer sometimes thinks a sentence spans across two paragraphs,...

dhgelling

Skip unparsable urls

I get problems with some image urls when using news-please: ``` Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/newsplease/crawler/commoncrawl_extractor.py", line 259, in _ _process_warc_gz_file filter_pass, article = self.filter_record(record) File "/opt/coviddash/ingress/covidmarch.py", line...

frankier

newspaper
newspaper copied to clipboard

Metadata

Parse only a certain directory

Categories filters don't work as expected

Receving same result for different urls

Iterating articles on news source produces duplicates, if subdomain omitted.

Added Marathi Language Extension

Newsweek articles do not download

Maintenance Status?

Parsing Incorrectly for Yahoo Finance

Fix concatenating sentence parts separated with newlines

Skip unparsable urls

← Metadata

Owner

Metadata

newspaper newspaper copied to clipboard

Metadata

← Metadata

Owner

Metadata

newspaper
newspaper copied to clipboard