backend icon indicating copy to clipboard operation
backend copied to clipboard

Source 651063 not parsing content out of HTML correctly

Open rahulbot opened this issue 4 years ago • 0 comments

The "Africa Newsroom (Arabic)" source content isn't parsing correctly. Aashka's SDG topic found this one via spidering.

For example, story #1307690431 (original URL) should have a bunch of about SDGs in Braille, but instead has a bunch of weird headlines that must have been parsed out of a different, non-content, part of the page (see the cached raw text).

Not critical to resolve right now, but flagging and recording in case it is indicative of a larger content parsing problem.

rahulbot avatar Sep 03 '19 17:09 rahulbot