news-please icon indicating copy to clipboard operation
news-please copied to clipboard

article.date_modify returns 'None' despite the article having a modified date

Open Anacoder1 opened this issue 4 years ago • 3 comments

Mandatory

  • [x] I read the documentation (readme and wiki).
  • [x] I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.

Related issues:

  • add them here

Describe the bug I have been trying to use the article.date_modify function to extract the modified date and time from different newspaper websites. The function returns None despite the site having a modified date. This is the case for every article URL I tried this function with.

To Reproduce

!pip3 install news-please         #ran this on Google Colab
from newsplease import NewsPlease

url1 = 'https://www.thequint.com/news/law/supreme-court-article-370-jammu-and-kashmir-reorganisation-cases-hearing-govt-affidavit-rejoinder'
article = NewsPlease.from_url(url1)
print(article.date_modify)

# prints None

Expected behavior I expected the code to return the date-time instance when the article was modified, in this case 2019-11-14 19:40:00

Log Nothing to add here. I just tried the code as shown in the To Reproduce section.

Versions (please complete the following information):

  • Google Colab
  • Python Version 3.6.9
  • news-please Version 1.5.3

Intent (optional; we'll use this info to prioritize upcoming tasks to work on)

  • [ ] personal

  • [ ] academic

  • [x] business

  • [ ] other

  • Some information on your project: Extracting modified date from newspaper articles

Anacoder1 avatar Sep 26 '20 11:09 Anacoder1

Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)

fhamborg avatar Oct 23 '20 08:10 fhamborg

Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)

but the sample.json also not containing date_modified ??

IqbalLx avatar Dec 24 '20 09:12 IqbalLx

Hi! I confuse when exploring the main/core code, so my solution to this problem is creating a new pipeline dedicated to altering the default date_modify. I use same concept as DateExtractor but now I am looking for dateModified in application/ld+json tag

IqbalLx avatar Dec 29 '20 16:12 IqbalLx