John Bumgarner

Results 75 comments of John Bumgarner

Yes. Please reference the section [Extraction from offline HTML files](https://github.com/johnbumgarner/newspaper3_usage_overview#extraction-from-offline-html-files) from my Newspaper3K usage document.

I did some research into this issue. Are the digits _92906_ the article's reference number? If this is the article's reference number then _Newspaper_ will always fail to convert this...

> Yes, that `92906` part is the article's reference number. > > Thank you for the pointer, I will take a look on that. You're welcome. Please close this issue,...

Do you see that the format of your URLs are wrong? bad URL: `https:/www.infowars.com/posts/is-nato-a-dead-man-walking/` good URL: `https://www.infowars.com/posts/is-nato-a-dead-man-walking/` I haven't tried to parse this source, so I don't know what data...

The way that you are passing the URL is incorrect. The correct way to pass this URL is this way: ``` from newspaper import Article article = Article('https://gist.githubusercontent.com/ma-ji/2dd9689a01c48bf7323b89d4e6b927d5/raw/21f680df041c9816d9d80faa4af599aa90df90be/raw_html.html') article.download() article.parse()...

Since the HTML is in a local file then there is another way to process the file. I show how to process such files in my [Newspaper usage overview document](https://github.com/johnbumgarner/newspaper3_usage_overview#extraction-from-offline-html-files)....

I downloaded your file locally. I was able to access the file with the code below with no issues. ``` with open("raw_html.html", 'r') as f: article = Article('', language='en') article.download(input_html=f.read())...

I'm not sure why it would hang on your Linux server. What server are you using? I also noted that you're using Python , which I haven't used with Newspaper.

There could be several problems. Can you share your code?