newsworker icon indicating copy to clipboard operation
newsworker copied to clipboard

Add structure detection of news objects

Open ivbeg opened this issue 1 year ago • 0 comments

Add structure detection and xpath reconstruction. Instead of dynamic news detection build pseudo-code to extract news from the page.

It should implement analysis logic that should detect:

  • news list block container
  • the type of news list: sub-blocks or mixed list
  • headline tag
  • text tag/tag-block
  • date tag is exists
  • links is exists
  • images if exists

Is something missing?

ivbeg avatar Aug 16 '22 08:08 ivbeg