newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Unable to pick up BBC Dates

Open shakna-israel opened this issue 3 years ago • 2 comments

Newspaper consistently seems unable to pick up dates on BBC articles.

However, it's fairly simply to grab them with BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(article.html, features="lxml")
mydivs = soup.find_all("div", {"class": "date"})
story_date = float(mydivs[0]['data-seconds'])

shakna-israel avatar Jul 12 '20 02:07 shakna-israel

I looked into this date issue. I think that the most consistent way to extract the date is from a script tag.

I recently started putting together a detailed Newspaper3k usage document that I'm publicly sharing. This document is available here: https://github.com/johnbumgarner/newspaper3_usage_overview. It contains the extraction code for BBC articles.

P.S. this document is a work in process, so more information will be added.

johnbumgarner avatar Oct 12 '20 12:10 johnbumgarner

@shakna-israel Did you try the information that I provided? If it worked for you please close this open issue.

Newspaper consistently seems unable to pick up dates on BBC articles.

However, it's fairly simply to grab them with BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(article.html, features="lxml")
mydivs = soup.find_all("div", {"class": "date"})
story_date = float(mydivs[0]['data-seconds'])

johnbumgarner avatar Apr 17 '21 13:04 johnbumgarner