TwelveMonkeys Use Document in SVGImageReader's setInput

(I thought it was better to create a new topic for this, see #442 for background)

The change you proposed to let setInput receive a Document object does in fact work (with ~~a few limitations that I will enumerate later~~ one limitation - more on that later).

I do think it would be a good thing to implement in TwelveMonkeys, since it is quite easy and makes the reader more friendly. So the changes that need to be done in TwelveMonkeys are:

In SVGImageReader.java exactly as you said:

    @Override
    public void setInput(Object pInput, boolean seekForwardOnly, boolean ignoreMetadata) {
        super.setInput(pInput, seekForwardOnly, ignoreMetadata);

        if (imageInput != null) {
            TranscoderInput input = new TranscoderInput(IIOUtil.createStreamAdapter(imageInput));
            rasterizer.setInput(input);
        } else if (pInput instanceof Document) {
            Document document = (Document) pInput;
            TranscoderInput input = new TranscoderInput(document);
            input.setURI(document.getBaseURI());
            rasterizer.setInput(input);
        }
    }

In SVGImageReaderSpi.java, change the canDecode to something like (a more specific type of Document may be needed, later on limitations):

    public boolean canDecodeInput(final Object pSource) throws IOException {
        return (pSource instanceof Document) || (pSource instanceof ImageInputStream && canDecode((ImageInputStream) pSource));
    }

And finally, in SVGProviderInfo.java, this must be added (a more specific type of Document may be needed, later on limitations):

    private final Class[] inputTypes = new Class[] {ImageInputStream.class, Document.class};
 
    @Override
    public Class[] inputTypes() {
        return inputTypes;
    }

And that's it! Works (almost) perfectly. You may want to add some test cases to check the new feature. I'll share the code on how to create a Document object from a file next, but first the ~~limitations~~ limitation:

~~1) The CSS problem still exists when using a Document as input. CSS will not be used if the xml-stylesheet tag is before the SVG tag.~~

Using the standard org.w3c.dom.Document does not work. It gives ClassCastExceptions, here is an example:

java.lang.ClassCastException: org.apache.batik.dom.GenericElement cannot be cast to org.w3c.dom.svg.SVGSVGElement
	at org.apache.batik.anim.dom.SVGOMDocument.getRootElement(SVGOMDocument.java:234)
	at org.apache.batik.bridge.BridgeContext.getBridgeExtensions(BridgeContext.java:2054)
	at org.apache.batik.bridge.BridgeContext.registerSVGBridges(BridgeContext.java:2044)
	at org.apache.batik.bridge.BridgeContext.setDocument(BridgeContext.java:445)
	at org.apache.batik.bridge.GVTBuilder.build(GVTBuilder.java:54)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader$Rasterizer.transcode(SVGImageReader.java:322)
	at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(XMLAbstractTranscoder.java:142)
	at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(SVGAbstractTranscoder.java:156)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader$Rasterizer.init(SVGImageReader.java:561)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader$Rasterizer.getDefaultWidth(SVGImageReader.java:574)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader.getWidth(SVGImageReader.java:240)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader.paramsToHints(SVGImageReader.java:180)
	at com.twelvemonkeys.imageio.plugins.svg.SVGImageReader.read(SVGImageReader.java:129)
	...

There is another person with the same problem: https://stackoverflow.com/questions/44029119/using-vanilla-w3c-document-and-apache-batik-svg-rasterizer

So, instead of creating a Document like this:

try {
    ImageReader reader = ...
    File svgFile = ...
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    // Probably good idea to set some configs, specially to disable loading DTD that causes a 15s wait.
    dbf.setNamespaceAware(true);
    dbf.setValidating(false);
    dbf.setExpandEntityReferences(false);
    dbf.setCoalescing(true);
    dbf.setIgnoringComments(false);
    dbf.setFeature("http://xml.org/sax/features/namespaces", false);
    dbf.setFeature("http://xml.org/sax/features/validation", false);
    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    Document doc = dbf.newDocumentBuilder().parse(svgFile);
    // ? doc.setDocumentURI(svgFile.getParentFile().toURI().toString());
    reader.setInput(doc);
    ...
} catch (IOException ex) {
    ...
}

one mut use the Batik way:

try {
    ImageReader reader = ...
    File svgFile = ...
    String parser = XMLResourceDescriptor.getXMLParserClassName();
    SAXSVGDocumentFactory df = new SAXSVGDocumentFactory(parser);
    Document doc = df.createDocument(svgFile.toURI().toString());
    // ? doc.setDocumentURI(svgFile.getParentFile().toURI().toString());
    reader.setInput(doc);
    ...
} catch (IOException ex) {
    ...
}

For this reason, it may be a good idea to force Document input to be of type SVGOMDocument (or maybe the more generic SVGDocument).

While it is very saddening that one cannot use a plain org.w3c.dom.Document (thus making the code plugin-aware / Batik-libraries-aware) ~~and that it does not fix the CSS problem that I was already experiencing when converting the Document to a byte array stream~~, I think it is a nice feature to add in TweleveMonkeys as it wouldn't hurt anyone.

Oct 22 '18 09:10 j-p-sequeira

The CSS problem is my fault after all, argh! When I deep clone the Document, somehow the CSS line gets lost. So there is no issue at all with the CSS, you can forget that.

Oct 22 '18 11:10 j-p-sequeira

Excellent!

If we can only reliably support SVGDocument or SVGOMDocument, we should change the supported input type to that. The plugin does depend on Batik anyway, so I don't think that is much of a problem. We do need to make sure that the Spi does not crash, but silently de-register itself if Batik isn't available, though.

Can you add a test case too, and perhaps create a pull request for this? I think this would be a useful contribution. :-)

Best regards,

-- Harald K

Oct 23 '18 07:10 haraldk

Well, a work around to also support the vanilla Document would be to get an ImageInputStream over the in-memory object. Like this:

    /**
     * Gets an ImageInputStream to the SVG XML Document.
     *
     * @param svgXML - Document to which a stream will be returned.
     * @return An ImageInputStream to the SVG XML Document.
     * @throws java.io.IOException
     * @throws javax.xml.transform.TransformerConfigurationException
     * @throws javax.xml.transform.TransformerException
     */
    public static ImageInputStream getDocumentInputStream(Document svgXML) throws IOException, TransformerConfigurationException, TransformerException {
        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
            Result outputTarget = new StreamResult(outputStream);
            Transformer transformer = TransformerFactory.newInstance().newTransformer();

            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.VERSION, svgXML.getXmlVersion());
            transformer.setOutputProperty(OutputKeys.STANDALONE, svgXML.getXmlStandalone() ? "yes" : "no");
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            //transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            //transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            //transformer.setOutputProperty("{http://xml.apache.org/xalan}indent-amount", "2");

            transformer.transform(new DOMSource(svgXML), outputTarget);
            //System.out.println(new String(outputStream.toByteArray(), transformer.getOutputProperty(OutputKeys.ENCODING)));

            return ImageIO.createImageInputStream(new ByteArrayInputStream(outputStream.toByteArray()));
        }
    }

This is how I'm doing it in my image viewer, since I need to use a vanilla Document to do some XML tweaking.

Would this be interesting? I know it is not that elegant, I mean, converting an already parsed document to bytes to let batik re-parse it again later, but I'm using it and it is quite fast. I mean, I don't know what happens if the SVG file size is in the order of MB...

Oct 23 '18 07:10 j-p-sequeira

Okay, I just created the pull request including the hack for accepting vanilla Documents. You can easily remove it if you feel it is not appropriated ; )

About the test cases, I don't really understand how/where to do it. But maybe this method will be handy to you:

    public Document loadXMLDocument(File file, boolean vanilla) throws IOException, ParserConfigurationException, SAXException{
        Document doc;
        
        if(vanilla){
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            // Probably good idea to set some configs, specially to disable loading DTD that causes a 15s wait.
            dbf.setNamespaceAware(true);
            dbf.setValidating(false);
            dbf.setExpandEntityReferences(false);
            dbf.setCoalescing(true);
            dbf.setIgnoringComments(false);
            dbf.setFeature("http://xml.org/sax/features/namespaces", false);
            dbf.setFeature("http://xml.org/sax/features/validation", false);
            dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
            dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
            doc = dbf.newDocumentBuilder().parse(file);
        } else {
            String parser = XMLResourceDescriptor.getXMLParserClassName();
            SAXSVGDocumentFactory df = new SAXSVGDocumentFactory(parser);
            doc = df.createDocument(file.toURI().toString());
        }
        
        if(doc != null) {
            doc.setDocumentURI(file.toURI().toString());
        }
        
        return doc;
    }

If you use vanilla==true it will return a simple org.w3c.dom.Document, otherwise it will return a Batik SVGDocument.

Oct 23 '18 10:10 j-p-sequeira

Thanks!

Okay, I won't merge directly without a test case, but I might look into it at some point.

I any case, I don't think I like the serialize/deserialize-hack for handling plain Documents. In my mind, it kind of defeats the purpose of the feature, which is to make it faster to render a document if you already have it parsed in memory.

If we need to serialize and re-parse from an ImageInputStream, you could already do this without this feature, externally.

-- Harald K

Oct 23 '18 10:10 haraldk

Sure, I totally understand that. Feel free to remove the hack, I had it already implemented in my code anyway : )

Oct 23 '18 11:10 j-p-sequeira

TwelveMonkeys TwelveMonkeys copied to clipboard

Use Document in SVGImageReader's setInput

TwelveMonkeys
TwelveMonkeys copied to clipboard