extruct icon indicating copy to clipboard operation
extruct copied to clipboard

Ignore invalid jsonld elements on the page source.

Open naveen17797 opened this issue 3 years ago • 3 comments

This PR alters the behaviour such that If there are invalid jsonld elements with valid elements on the page source, it returns only the valid jsonld elements.

naveen17797 avatar Jan 09 '22 13:01 naveen17797

previously, if we had invalid JSON, we'd crash or log it (depending on the error setting). Now, it would be silently ignored. What do you think about still having some logging if our last attempt at parsing the JSON fails, similar to this?

i have added the log statement

logging.exception('Invalid jsonld element detected %s', script)

naveen17797 avatar Jan 11 '22 00:01 naveen17797

i understand if we merge this PR this would remove the ability to stop parsing the page with invalid jsonld elements, previously extruct will raise an exception and fail for jsonld, i am not sure how could i retain this behaviour.

may be i can pass errors argument from extract() function to jsonld extractor and determine if we need to raise an exception or just return all valid elements @lopuhin ?

naveen17797 avatar Jan 11 '22 00:01 naveen17797

Is there a timeline on this issue. It would be good to get a fix for this issue.

argaurav avatar Aug 03 '23 08:08 argaurav