extraction-framework Invalid XML chars in long abstracts

Not sure if this is actually an error. When i use an XML representation of the long abstracts i get parse errors when accessing some abstracts because they contain invalid (not allowed by specification) XML chars. F.e. processing the english abstract for http://dbpedia.org/resource/Olive using openrdf results in:

org.openrdf.rio.RDFParseException An invalid XML character (Unicode: 0x1) was found in the element content of the document.

Maybe this is intentional regarding the other serialization formats do not have a problem with this, but it prevents XML processing of the data. Happens with version 3.9.

Feb 27 '14 10:02 robert-david

+1

Apr 25 '15 15:04 Hronom

@Hronom is this still valid?

Oct 06 '15 13:10 jimkont

Oh this happens a long time ago... I'll try check in near time.

Oct 06 '15 14:10 Hronom

trying to close old issues, when you do, feel free to close directly or re-comment

cheers!

Oct 06 '15 14:10 jimkont

ping @Hronom still valid? See https://databus.dbpedia.org/dbpedia/text/long-abstracts/

@Vehnem can we write test for this?

May 15 '20 15:05 m1ci