newsworker
newsworker copied to clipboard
Add structure detection of news objects
Add structure detection and xpath reconstruction. Instead of dynamic news detection build pseudo-code to extract news from the page.
It should implement analysis logic that should detect:
- news list block container
- the type of news list: sub-blocks or mixed list
- headline tag
- text tag/tag-block
- date tag is exists
- links is exists
- images if exists
Is something missing?