dkpro-wsd icon indicating copy to clipboard operation
dkpro-wsd copied to clipboard

SAXReader-based XML should try to find and process the DTD

Open logological opened this issue 10 years ago • 3 comments

Originally reported on Google Code with ID 43

When SAXReader-based XML readers try to read an XML file which specifies a DTD, they
fail because they can't find the DTD.

The problem can possibly be solved in *some* cases by using an EntityResolver which
looks for the DTD in the same place as the XML file.  However, I am pretty sure this
won't always work if the XML file is being read from the classpath, since it might
be inside a JAR.

Readers should probably therefore include a configuration parameter for ignoring the
DTD.  This would be implemented by an EntityResolver returning an empty InputSource.

Reported by [email protected] on 2013-11-06 13:59:36

logological avatar Jun 24 '15 15:06 logological

Reported by [email protected] on 2013-11-06 14:00:03

  • Blocking: #13

logological avatar Jun 24 '15 15:06 logological

I implemented a null EntityResolver and made all the XML readers use it.  This isn't
an ideal solution as the XML readers will no longer issue a diagnostic for bad XML
files.

The XML reader in de.tudarmstadt.ukp.dkpro.core.io.xml seems to use some other SAX-based
method of handling XML which might not have this DTD problem.  We should study it and
see if we can adapt the technique.

Reported by [email protected] on 2013-11-06 14:51:01

  • Labels added: Priority-Medium
  • Labels removed: Priority-High

logological avatar Jun 24 '15 15:06 logological

Reported by [email protected] on 2013-11-06 14:53:56

  • No longer blocking: #13

logological avatar Jun 24 '15 15:06 logological