Build newspaper to get recent articles
Issue by eet182561
Wed May 13 11:04:34 2020
Originally opened as https://github.com/codelucas/newspaper/issues/810
Is there a way to build newspaper to get recent articles first? I don't know what how articles are sorted but they are definitely not sorted according to their publishing date. I need articles from last couple of days only. When I start traversing all articles by downloading and parsing the publishing date, but after around 30 articles the server refuses to connect. I even gave a delay of 20 seconds when such exception occurs but still the web server refuses to connect. Is there any work around for this or to build in such a way to get the recent articles first? I am building www.dawn.com
Comment by johnbumgarner
Sat Nov 21 13:52:33 2020
I believe that newspaper build just pulls the articles from the page in the navigational structure that they were discovered. Have you considered using BS4, which is embedded with newspaper to obtain the article URLs and published dates? Once you have that information you can use newspaper to select your articles based on the timestamp.