tagsoup icon indicating copy to clipboard operation
tagsoup copied to clipboard

resolveEntity being passed incorrect systemid

Open faceless2 opened this issue 8 years ago • 0 comments

To my surprised I've found an actual issue!

We have an HTML file with this DOCTYPE

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">

In org.ccil.cowan.tagsoup.Parser.decl, this results in the "systemid" variable being set to a blank string (Parser.java line 851), which is fine. However, if an EntityResolver is set, these lines just after:

if (theScanner instanceof Locator) {    // Must resolve systemid
    theDoctypeSystemId  = ((Locator)theScanner).getSystemId();
    try {
        theDoctypeSystemId = new URL(new URL(theDoctypeSystemId), systemid).toString();
    } catch (Exception e) {}
}

result in resolveEntity being called with the System ID from the Locator set as the system ID of the entity. This is clearly incorrect.

The fix is to change line 864 to

if (systemid != null && systemid.length() > 0 && theScanner instanceof Locator) { 

i.e. don't try to resolve the entity systemid against the Locator's systemid unless it's actually specified.

(cross-posting to both the "orbeon" and "jukka" forks of tagsoup on github, to try to keep things in sync)

faceless2 avatar Nov 15 '17 14:11 faceless2