dkpro-wsd
dkpro-wsd copied to clipboard
SAXReader-based XML should try to find and process the DTD
Originally reported on Google Code with ID 43
When SAXReader-based XML readers try to read an XML file which specifies a DTD, they
fail because they can't find the DTD.
The problem can possibly be solved in *some* cases by using an EntityResolver which
looks for the DTD in the same place as the XML file. However, I am pretty sure this
won't always work if the XML file is being read from the classpath, since it might
be inside a JAR.
Readers should probably therefore include a configuration parameter for ignoring the
DTD. This would be implemented by an EntityResolver returning an empty InputSource.
Reported by [email protected] on 2013-11-06 13:59:36
I implemented a null EntityResolver and made all the XML readers use it. This isn't
an ideal solution as the XML readers will no longer issue a diagnostic for bad XML
files.
The XML reader in de.tudarmstadt.ukp.dkpro.core.io.xml seems to use some other SAX-based
method of handling XML which might not have this DTD problem. We should study it and
see if we can adapt the technique.
Reported by [email protected] on 2013-11-06 14:51:01
- Labels added: Priority-Medium
- Labels removed: Priority-High