trafilatura
trafilatura copied to clipboard
Teaser with link in article flow
Articles often feature text snippets describing further suggestions, not all of them are handled properly by the extractors.
Is there an algorithmic way to discard such inserts?
Example: "Mehr zum Thema" (in bold font) + link on https://de.rt.com/inland/116842-tag-arbeit-deutschland-bleibt-niedriglohnland/