newsworker icon indicating copy to clipboard operation
newsworker copied to clipboard

Can't extract feed from UNIDO website

Open ivbeg opened this issue 1 year ago • 1 comments

URL: https://www.unido.org/news Reason: Date prefixed by city name and aligned right. Examples:

  • GENEVA, 29 July 2022
  • VIENNA, 9 AUGUST 2022
  • Bangkok, 21-22 July 2022

Sometimes dates are missing in the text on news list

ivbeg avatar Aug 15 '22 10:08 ivbeg

Possible solutions:

  • to follow each url and to extract date from dcterms.date metadata key
  • recognize right-aligned dates in text
  • extract date from last-modified header of associated media - example

ivbeg avatar Aug 15 '22 10:08 ivbeg