dom4j icon indicating copy to clipboard operation
dom4j copied to clipboard

XMLWriter should reject illegal control characters

Open reisners opened this issue 7 years ago • 1 comments

According to the XML specification only certain control characters (0x09, 0x0A, and 0x0D) may occur in XML content. The XMLWriter should reject other characters like 0x0C (\f), but doesn't. To reproduce:

public class Dom4jTest {
    @Test(expected = Throwable.class)
    public void shouldRejectInvalidCharacters() throws IOException {
        Document doc = DocumentHelper.createDocument();
        Element text = DocumentHelper.createElement("TEXT");
        text.setText("\f");
        doc.add(text);
        StringWriter xml = new StringWriter();
        XMLWriter writer = new XMLWriter(xml, OutputFormat.createPrettyPrint());
        writer.write(doc);
    }
}

reisners avatar Aug 01 '18 13:08 reisners

Hi,

Facing the same issue with char 26 (SUB). XMLWriter replaced it by its ascii escaped value, but no XML V1.0 parser accepts that. I believe XMLWriter.escapeElementEntities, line 1671 should not escape every char before char 32, but should instead skip those completely. The ones that are valid in that range are already handled in previous case block.

As reminder W3C specifies valid chars like that : #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

BR.

yrey10 avatar Nov 10 '20 16:11 yrey10