John Bumgarner comments

Results 75 comments of


John Bumgarner

Can't use NewsPaper3k on the site : https://www.newspapers.com/

How are you logging into the site?

Is it possible to use newspapper3k on files?

Yes. Please reference the section [Extraction from offline HTML files](https://github.com/johnbumgarner/newspaper3_usage_overview#extraction-from-offline-html-files) from my Newspaper3K usage document.

Date regex should not assume date of month from just first (two) digits after /

I did some research into this issue. Are the digits _92906_ the article's reference number? If this is the article's reference number then _Newspaper_ will always fail to convert this...

Date regex should not assume date of month from just first (two) digits after /

> Yes, that `92906` part is the article's reference number. > > Thank you for the pointer, I will take a look on that. You're welcome. Please close this issue,...

Unable to pull articles from list of article URL's

Do you see that the format of your URLs are wrong? bad URL: `https:/www.infowars.com/posts/is-nato-a-dead-man-walking/` good URL: `https://www.infowars.com/posts/is-nato-a-dead-man-walking/` I haven't tried to parse this source, so I don't know what data...

`parse` hangs on some files

The way that you are passing the URL is incorrect. The correct way to pass this URL is this way: ``` from newspaper import Article article = Article('https://gist.githubusercontent.com/ma-ji/2dd9689a01c48bf7323b89d4e6b927d5/raw/21f680df041c9816d9d80faa4af599aa90df90be/raw_html.html') article.download() article.parse()...

`parse` hangs on some files

Since the HTML is in a local file then there is another way to process the file. I show how to process such files in my [Newspaper usage overview document](https://github.com/johnbumgarner/newspaper3_usage_overview#extraction-from-offline-html-files)....