backend Source 651063 not parsing content out of HTML correctly

Source 651063 not parsing content out of HTML correctly

Open rahulbot opened this issue 4 years ago • 0 comments

The "Africa Newsroom (Arabic)" source content isn't parsing correctly. Aashka's SDG topic found this one via spidering.

For example, story #1307690431 (original URL) should have a bunch of about SDGs in Braille, but instead has a bunch of weird headlines that must have been parsed out of a different, non-content, part of the page (see the cached raw text).

Not critical to resolve right now, but flagging and recording in case it is indicative of a larger content parsing problem.

Sep 03 '19 17:09 rahulbot

backend backend copied to clipboard

Source 651063 not parsing content out of HTML correctly

backend
backend copied to clipboard