news-please
news-please copied to clipboard
article.date_modify returns 'None' despite the article having a modified date
Mandatory
- [x] I read the documentation (readme and wiki).
- [x] I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.
Related issues:
- add them here
Describe the bug
I have been trying to use the article.date_modify
function to extract the modified date and time from different newspaper websites.
The function returns None
despite the site having a modified date. This is the case for every article URL I tried this function with.
To Reproduce
!pip3 install news-please #ran this on Google Colab
from newsplease import NewsPlease
url1 = 'https://www.thequint.com/news/law/supreme-court-article-370-jammu-and-kashmir-reorganisation-cases-hearing-govt-affidavit-rejoinder'
article = NewsPlease.from_url(url1)
print(article.date_modify)
# prints None
Expected behavior
I expected the code to return the date-time instance when the article was modified, in this case 2019-11-14 19:40:00
Log
Nothing to add here. I just tried the code as shown in the To Reproduce
section.
Versions (please complete the following information):
- Google Colab
- Python Version 3.6.9
- news-please Version 1.5.3
Intent (optional; we'll use this info to prioritize upcoming tasks to work on)
-
[ ] personal
-
[ ] academic
-
[x] business
-
[ ] other
-
Some information on your project:
Extracting modified date from newspaper articles
Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)
Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)
but the sample.json also not containing date_modified ??
Hi! I confuse when exploring the main/core code, so my solution to this problem is creating a new pipeline dedicated to altering the default date_modify. I use same concept as DateExtractor but now I am looking for dateModified in application/ld+json tag