newsworker icon indicating copy to clipboard operation
newsworker copied to clipboard

Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds

Results 16 newsworker issues
Sort by recently updated
recently updated
newest added

Rename command `find` to `scan` and improve its behavior with return of status codes if nothing found and JSON output to the file instead of stdout - [x] Rename `find`...

enhancement

Add structure detection and xpath reconstruction. Instead of dynamic news detection build pseudo-code to extract news from the page. It should implement analysis logic that should detect: - news list...

enhancement
breaking change

URL: https://unfccc.int/news Reason: date format not detected for dates like `12 Aug, 2022` Solutions: - support dates for date format `12 Aug, 2022` in qddate date detection library

bug

URL: https://public.wmo.int/en/media/news Reason: there are invisible h2 tags and images with URL to file before text. Instead of news title invisible h2 tag with image title detected. Solutions: - try...

bug

Issue: Can't parse dates like 25/7/22. Instead of year 2022 it's detected as year 22 URL: https://www.icao.int/Newsroom/Pages/default.aspx Solutions: - postprocessing to fix dates below certain year number below 1990 -...

bug

Add webserver with most commands provided via REST API like: - `analyze` - analyze url - `init` - initialize project - `run` - run project - `extract` - extract feed...

enhancement

Consider adding configuration files for data aggregation. it includes: - [ ] Adding command `feedcmd init` to initialize extraction with `--url ` option. It should generate `.newsworker.yaml` file (or TOML)...

enhancement
question

URL: https://unhabitat.org/news-and-stories Reason: Unsupported date time format Examples: - August 5th, 2022 - July 22nd, 2022 Possible solutions: - add this type of dates to qddate - extract `Last-Modified` header...

bug

URL: https://www.unido.org/news Reason: Date prefixed by city name and aligned right. Examples: - GENEVA, 29 July 2022 - VIENNA, 9 AUGUST 2022 - Bangkok, 21-22 July 2022 Sometimes dates are...

bug

Add crawl option to find to find possible news feeds on website Command `feedcmd crawl ` Options: - `--limit ` - max number of pages to process - `--output `...

enhancement