newspaper issues

Results 152 newspaper issues

Sort by recently updated

Is this project still being maintained?

Providing an asyncio interface

Since newspaper3k is now a python3 library, it'd be nice to also support an asyncio-compatible interface for users that want to integrate its use into existing async applications.

jordal

enhancement

Date regex should not assume date of month from just first (two) digits after /

It looks like the current `urls.STRICT_DATE_REGES` immediately takes the first (two) digit(s) after a slash as date of month. ``` >>> from newspaper import Article >>> url = "https://prachatai.com/journal/2021/04/92713" >>>...

bact

is this package reliable?

I am new in website crawler. I look around the github crawlers. I think this is the easiest to use package for newspaper crawler. but is this reliable enough if...

nickhuangxinyu

Not working on New York Times

As mentioned in many issues: #645 #363 , newspaper doesn't work on New York times. And I tested two versions of New York times, one is the English version, the...

JohnChu101

Missing some paragraph

Hi!, i'm using newspaper for the first time. I want to extract the text of this article [http://www.infobae.com/politica/2017/03/04/el-gobierno-volvio-a-acusar-al-kirchnerismo-de-desestabilizador-por-la-marcha-de-la-cgt/](url) but it only extracts the 3rd paragraph. What can I do to...

SantiagoSalem

Not woking on "nytimes.com"

I tried few articles from NYtimes.com but it is able to parse half article and missing first half Example urls: [url 1](https://www.nytimes.com/2017/05/04/world/europe/buckingham-palace-meeting-uk.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=second-column-region&region=top-news&WT.nav=top-news) [url2](https://www.nytimes.com/2017/05/04/us/politics/house-republican-health-bill.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=a-lede-package-region&region=top-news&WT.nav=top-news)

Praveena0989

how to use html file in newspaper3k as it work with url page

please help me @yprez

MeetH15

threading.Thread.setDaemon has been deprecated in favor of setting daemon attribute directly in Python 3.10

Ref : python/cpython#25174 https://github.com/codelucas/newspaper/blob/f622011177f6c2e95e48d6076561e21c016f08c3/newspaper/utils.py#L134

tirkarthi

SSLError

rticle `download()` failed with HTTPSConnectionPool(host='www.dingdiann.com', port=443): Max retries exceeded with url: /ddk74633/5016142.html (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),)

Gandi-95

newspaper
newspaper copied to clipboard

Metadata

Is this project still being maintained?

Providing an asyncio interface

Date regex should not assume date of month from just first (two) digits after /

is this package reliable?

Not working on New York Times

Missing some paragraph

Not woking on "nytimes.com"

how to use html file in newspaper3k as it work with url page

threading.Thread.setDaemon has been deprecated in favor of setting daemon attribute directly in Python 3.10

SSLError

← Metadata

Owner

Metadata

newspaper newspaper copied to clipboard

Metadata

← Metadata

Owner

Metadata

newspaper
newspaper copied to clipboard