xml-rs
xml-rs copied to clipboard
Fails to parse/ignore DTD
I'm trying to parse an autosar-dcf file which starts like this:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE DCF [
<!ELEMENT DCF ((NAME,ATTRDEF?,PROFILESETTINGS?,FILEREF*)?)>
<!ATTLIST DCF
ARSCHEMA (21XSDREV0017 | 30XSDREV0003) "21XSDREV0017">
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT ATTRDEF (#PCDATA)>
<!ELEMENT FILEREF (ARXML, DCB?, ECUC?, GENATTR?)>
<!ELEMENT ARXML (#PCDATA)>
<!ATTLIST ARXML TYPE CDATA ""
ROOTITEM (CONSTANT|DATATYPE|PORTINTERFACE|SIGNAL|COMPONENTTYPE|ECUPROJECT|VEHICLEPROJECT) #REQUIRED >
<!ELEMENT DCB (#PCDATA)>
<!ELEMENT ECUC (#PCDATA)>
<!ELEMENT GENATTR (#PCDATA)>
<!ELEMENT PROFILESETTINGS (#PCDATA)>
]>
I get Error { pos: 4:2, kind: Syntax("Unexpected token \'<!\' before \'E\'") } as a result.
I don't really care about the DTD, but it would be nice to be able to parse the actual file-data that follows.
Yes, unfortunately DTD parsing and even skipping is incomplete right now.
I think it is possible to make skipping a bit more intelligent, so it would skip the entire embedded DTD correctly. If you want, you can try to fix this. However, currently another implementation of the parser is being worked on, which would handle this problem as well.
Still happens with latest sources. Any plans to fix it and/or merge pull request?
Fixed