crawling-framework
crawling-framework copied to clipboard
Log actual LD+JSON on parsing error
Error should also log erroneous JSON so that we could learn how to pre-process it to avoid such errors
WARN l.t.c.p.u.JsonLdParser - Failed to parse ld+json
com.fasterxml.jackson.core.JsonParseException: Document contains more content after json-ld element - (possible mismatched {}?)
at [Source: java.io.StringReader@72eeb417; line: 31, column: 10]
at com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:167) ~[crawler-standalone.jar:?]
at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:122) ~[crawler-standalone.jar:?]
at com.github.jsonldjava.utils.JsonUtils.fromString(JsonUtils.java:190) ~[crawler-standalone.jar:?]
at lt.tokenmill.crawling.parser.utils.JsonLdParser.parse(JsonLdParser.java:37) [crawler-standalone.jar:?]
at lt.tokenmill.crawling.parser.ArticleExtractor.extractArticleWithDetails(ArticleExtractor.java:35) [crawler-standalone.jar:?]
at lt.tokenmill.crawling.parser.ArticleExtractor.extractArticle(ArticleExtractor.java:22) [crawler-standalone.jar:?]