jackson-dataformat-xml icon indicating copy to clipboard operation
jackson-dataformat-xml copied to clipboard

Root name missing when root element has no attributes

Open Sam-Kruglov opened this issue 4 years ago • 6 comments

Version: 2.12.3 Hi, I need to get XML root name when deserializing but the parser is too far away when my deserializer is called if there is no attribute on the root element:

@Value
@JsonDeserialize(using = XmlDeserializer.class)
static class XmlWrapper {
    String xmlRootName;
}

static class XmlDeserializer extends JsonDeserializer<XmlWrapper> {
    @Override
    public XmlWrapper deserialize(JsonParser p, DeserializationContext ctxt) {
        return new XmlWrapper(((FromXmlParser) p).getStaxReader().getLocalName());
    }
}

@SneakyThrows
public static void main(String[] args) {
    val mapper = XmlMapper.builder().findAndAddModules().build();
    System.out.println("with attribute: " + mapper.readValue(
            "<root foo='bar'><field>value</field></root>", XmlWrapper.class
    ).getXmlRootName());
    System.out.println("without attribute: " + mapper.readValue(
            "<root><field>value</field></root>", XmlWrapper.class
    ).getXmlRootName());
}

Output:

with attribute: root
without attribute: field

Here's what I found during debugging:

com.fasterxml.jackson.dataformat.xml.XmlFactory#_createParser(java.io.Reader, com.fasterxml.jackson.core.io.IOContext)

_initializeXmlReader(sr) moves the reader forward a little but getLocalName still returns root. new FromXmlParser(ctxt, _parserFeatures, _xmlParserFeatures, _objectCodec, sr) moves the reader forward again and this time getLocalName returns field

#484 might be related here, not sure


My use case: I need to get xml root element of arbitrary user-supplied XML, do something, and then return an object that has the same root element. My logic is abstract and the vendor API that supplies me an object, expects similar object in return. I can only know the root element at runtime

Sam-Kruglov avatar Oct 06 '21 18:10 Sam-Kruglov

Please note that 2.12.3 is not the latest 2.12 patch; you may want to check out 2.12.5 first. There are some fixes since 2.12.3. Further, 2.13.0 was just released, also with a few fixes.

Beyond this I think the problem is likely with custom deserializer. Use of underlying Stax parser is not really supported so if it works, fine; if not, it is not something that I can necessarily help with: only access through JsonParser (it is actually a special type, FromXmlParser) is supported. So I would try to figure out that approach first; otherwise you will have to know exactly what kinds of transformations Jackson itself does to the token stream.

cowtowncoder avatar Oct 09 '21 00:10 cowtowncoder

Thanks, I tried to use the parser but it seems to only work with “fields”, and there is bo way to distinguish xml element from xml attribute. I guess I could try again with a new version

Sam-Kruglov avatar Oct 09 '21 05:10 Sam-Kruglov

Maybe there could be a method on parser to get root element name?

Sam-Kruglov avatar Oct 09 '21 05:10 Sam-Kruglov

Parser works in streaming mode and does not build any tree representation. There is, however, something like getParsingContext() which does have logical parent property name information -- this is JSON-style content, not raw XML information. It might be useful. There is similarly no way to distinguish element/attribute information as that is converted by lower-level helper class (XmlTokenStream): databinding is format-agnostic and has no knowledge of deviations.

Element/attribute distinction is not used by Jackson at all on deserialization side: it is only used for serialization to produce attributes when expected.

cowtowncoder avatar Oct 09 '21 18:10 cowtowncoder

Unfortunately for me, the context does not contain anything right at the start of the deserializer. Screenshot 2021-10-09 at 21 10 59

But actually I was able to find the root element over here (the field is private):

((FromXmlParser) parserOriginal)._xmlTokens.getLocalName()

As I mentioned earlier, the reader moves past the root element inside the FromXmlParser constructor, specifically, it does so when calling _xmlTokens.initialize(), which actually saves the localName as a field before moving the reader, so it's still there.

I think it would make sense that this information would be available through parser.parsingContext.parent.currentName. Or I can also see that XmlTokenStream class is public, so, maybe also exposing parser.getXmlTokenStream() getter is an option.

Anyway, my workaround is to take the _xmlTokens.localName via reflection in case I detect that XML reader is too far. Otherwise, using reader.localName.

Sam-Kruglov avatar Oct 09 '21 18:10 Sam-Kruglov

I think that it might be possible to expose additional XML-specific information via sub-class of JsonStreamContext (specifically, XmlReadContext), but not to change what is exposed by base class. There are a few challenges with this, including the fact that when buffering content (via TokenBuffer) context object will not be of XML-specific type. But in general I am open to possibility.

I would discourage attempts to access XmlTokenStream since such use will be fragile and unsupported: it is not part of API supported for external use. But with that said, you can decide to use that if it still makes sense.

cowtowncoder avatar Oct 10 '21 00:10 cowtowncoder