newsworker
newsworker copied to clipboard
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
Add page analysis command. It should be `feedcmd analysis ` with output of possible feeds on the page and feed types and example feed entities
Local files date extraction should be supported too. Required to write proper tests
Instead of dynamic page structure identification generate a template with a number of options that should simplify data parsing afterward. It should include: - location of the container tag -...
The current rule is to use the first link by default. It doesn't work well. Example URL http://pskenergo.ru/news/branch/ instead of a post URL, each time a category URL is detected....
URL https://inspire.ec.europa.eu/news Example: `Monday, January 31, 2022` Need to update qddate patterns