newspaper
newspaper copied to clipboard
Unable to pick up BBC Dates
Newspaper consistently seems unable to pick up dates on BBC articles.
However, it's fairly simply to grab them with BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(article.html, features="lxml")
mydivs = soup.find_all("div", {"class": "date"})
story_date = float(mydivs[0]['data-seconds'])
I looked into this date issue. I think that the most consistent way to extract the date is from a script tag.
I recently started putting together a detailed Newspaper3k usage document that I'm publicly sharing. This document is available here: https://github.com/johnbumgarner/newspaper3_usage_overview. It contains the extraction code for BBC articles.
P.S. this document is a work in process, so more information will be added.
@shakna-israel Did you try the information that I provided? If it worked for you please close this open issue.
Newspaper consistently seems unable to pick up dates on BBC articles.
However, it's fairly simply to grab them with BeautifulSoup:
from bs4 import BeautifulSoup soup = BeautifulSoup(article.html, features="lxml") mydivs = soup.find_all("div", {"class": "date"}) story_date = float(mydivs[0]['data-seconds'])