extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

Invalid XML chars in long abstracts

Open robert-david opened this issue 11 years ago • 5 comments

Not sure if this is actually an error. When i use an XML representation of the long abstracts i get parse errors when accessing some abstracts because they contain invalid (not allowed by specification) XML chars. F.e. processing the english abstract for http://dbpedia.org/resource/Olive using openrdf results in:

org.openrdf.rio.RDFParseException An invalid XML character (Unicode: 0x1) was found in the element content of the document.

Maybe this is intentional regarding the other serialization formats do not have a problem with this, but it prevents XML processing of the data. Happens with version 3.9.

robert-david avatar Feb 27 '14 10:02 robert-david

+1

Hronom avatar Apr 25 '15 15:04 Hronom

@Hronom is this still valid?

jimkont avatar Oct 06 '15 13:10 jimkont

Oh this happens a long time ago... I'll try check in near time.

Hronom avatar Oct 06 '15 14:10 Hronom

trying to close old issues, when you do, feel free to close directly or re-comment

cheers!

jimkont avatar Oct 06 '15 14:10 jimkont

ping @Hronom still valid? See https://databus.dbpedia.org/dbpedia/text/long-abstracts/

@Vehnem can we write test for this?

m1ci avatar May 15 '20 15:05 m1ci