newsworker
newsworker copied to clipboard
Can't extract feed from UNIDO website
URL: https://www.unido.org/news Reason: Date prefixed by city name and aligned right. Examples:
- GENEVA, 29 July 2022
- VIENNA, 9 AUGUST 2022
- Bangkok, 21-22 July 2022
Sometimes dates are missing in the text on news list
Possible solutions:
- to follow each url and to extract date from
dcterms.date
metadata key - recognize right-aligned dates in text
- extract date from
last-modified
header of associated media - example