John Bumgarner comments

Results 75 comments of


John Bumgarner

Unable to pick up BBC Dates

I looked into this date issue. I think that the most consistent way to extract the date is from a script tag. I recently started putting together a detailed Newspaper3k...

Unable to pick up BBC Dates

@shakna-israel Did you try the information that I provided? If it worked for you please close this open issue. > Newspaper consistently seems unable to pick up dates on BBC...

Authors and date are not correctly identified in wordpress website

Here is an [overview document](https://github.com/johnbumgarner/newspaper3_usage_overview) that I wrote on using newspaper3k. This document outlines how to extract the data elements from your page's structure. Here is some basic code to...

Not able to crawl Javascript-disabled webpages

seekingalpha.com requires a login, so you need to pass that information to the website to harvest the article text. I haven't tried to use newspaper3k for this, but it should...

Error converting html to string.

The article [http://www.jumhuriyat.tj/index.php?art_id=44635](http://www.jumhuriyat.tj/index.php?art_id=44635) cannot be scraped with Newspaper3k. The reason is related to the structure of the HTML, which doesn't provide a clear block of article text to extract.

Error converting html to string.

> I'm getting the same errors on multiple sites @giggioman00 What sites are giving you issues?

Error converting html to string.

> even i am also facing the same issue, is this repository running? @blueshirtdeveloper What sites are giving you issues?

how to use html file in newspaper3k as it work with url page

> I m trying to get exactly same result as It was using demo url: http://newspaper-demo.herokuapp.com/articles/show?url_to_clean=http%3A%2F%2Fwww.cnn.com%2F2014%2F01%2F12%2Fworld%2Fasia%2Fnorth-korea-charles-smith%2Findex.html > > but not getting same result help? Are you wanting to output the...

how to use html file in newspaper3k as it work with url page

> I want to extract date, title and text from article that I passed as HTML. I have tried this > > article = Article("random_url") #I have tried with just...

how to use html file in newspaper3k as it work with url page

> > ```python > > from newspaper import Article > > > > your_html = """ > > index.html > > > > > > > > > > >...