XsdParser icon indicating copy to clipboard operation
XsdParser copied to clipboard

Element ref not resolved correctly

Open SimonCockx opened this issue 1 year ago • 4 comments

Describe the bug A forward reference to an element results in an unresolved reference for the type of that element.

Consider the following schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
	<xs:complexType name="Document">
	    <xs:sequence>
	        <xs:element ref="myElem"/>
	    </xs:sequence>
	</xs:complexType>
	
	<xs:element name="myElem" type="Foo" />
	
	<xs:complexType name="Foo">
	    <xs:attribute name="bar" type="xs:string" />
	</xs:complexType>
</xs:schema>

This correctly results in a complex type Document with a single element myElem, but the type of the element - Foo - is not resolved.

There are two references which should be resolved here: ref="myElem" and type="Foo".

I did some debugging and I think the reason why this is currently not working is as follows:

  1. Before references are resolved, both are represented as an UnsolvedReference.
  2. While resolving ref="myElem", a copy of the target element is made. See XSDParserCore#replaceUnsolvedReference line 560:
    XsdNamedElements substitutionElement = (XsdNamedElements) concreteElement.getElement().clone(oldElementAttributes, concreteElement.getElement().getParent());
    
    Note that this also copies the unsolved reference type="Foo".
  3. For some reason, this new copy is never resolved.

Expected behavior I would expect no unresolved references.

Library Version 1.2.15

SimonCockx avatar Oct 28 '24 17:10 SimonCockx

Digging further. This is probably related to working on Windows. I've noticed the unsolvedElements map changes from using slashes to using backslashes during reference resolution.

SimonCockx avatar Oct 28 '24 17:10 SimonCockx

This method is the culprit.

    public void addUnsolvedReference(UnsolvedReference unsolvedReference) {
        XsdSchema schema;

        try {
            schema = XsdAbstractElement.getXsdSchema(unsolvedReference.getElement(), new ArrayList<>());
        } catch (ParentAvailableException e) {
            schema = null;
        }

        String localCurrentFile = currentFile;

        if (schema != null) {
            String schemaFilePath = schema.getFilePath();

            if (!localCurrentFile.equals(schemaFilePath)) {
                localCurrentFile = schemaFilePath;
            }
        }

        List<UnsolvedReference> unsolved = unsolvedElements.computeIfAbsent(localCurrentFile, k -> new ArrayList<>());

        unsolved.add(unsolvedReference);
    }

Since localCurrentFile has slashes, and schemaFilePath has backslashes, this fails.

SimonCockx avatar Oct 28 '24 17:10 SimonCockx

This is not the first time I see issues in this library related to a poor handling of file paths. (slash versus backslash, relative versus absolute, ...) I wonder if it's doable to properly clean that up, for example (but not limited to) by using normalized Java Paths instead. @lcduarte any advice?

SimonCockx avatar Oct 28 '24 17:10 SimonCockx

Hello Simon,

Thanks for the issue and for using the library. Also thanks for the investigation, while I agree that the file paths are a mess in the library I'm not entirely sure that's the source of your issue.

The issue is that the cloned XsdElement myElem doesn't yet the type resolved, which means that while eventually the instance I used to generate the clone will have the type resolved in the next iteration, the cloned instance never will.

This might be solved by solving the type references first and only then other references.

I've added the following line in XSDParserCore#resolveInnerRefs at 464:

Collections.sort(unsolvedReferenceList, new Comparator<UnsolvedReference>() {
    @Override
    public int compare(UnsolvedReference abc1, UnsolvedReference abc2) {
        return Boolean.compare(abc2.isTypeRef(), abc1.isTypeRef());
    }
});

Here:

imagem

Please check on your side if that fixes your issue.

lcduarte avatar Oct 28 '24 20:10 lcduarte

I can do you better: I can verify that normalizing the schemaFilePath in addUnsolvedReference solves my issue. While your solution might also solve it, I feel it looks more like a workaround for the actual underlying issue (i.e., unnormalized paths). Resolving cloned references works perfectly fine, but not if the schema path contains backslashes, since it will wrongly be treated as an unsolved reference of a different file.

My current workaround:

    public void addUnsolvedReference(UnsolvedReference unsolvedReference) {
        XsdSchema schema;

        try {
            schema = XsdAbstractElement.getXsdSchema(unsolvedReference.getElement(), new ArrayList<>());
        } catch (ParentAvailableException e) {
            schema = null;
        }

        String localCurrentFile = currentFile;

        if (schema != null) {
            String schemaFilePath = schema.getFilePath().replace("\\", "/"); // SEE FIX HERE

            if (!localCurrentFile.equals(schemaFilePath)) {
                localCurrentFile = schemaFilePath;
            }
        }

        List<UnsolvedReference> unsolved = unsolvedElements.computeIfAbsent(localCurrentFile, k -> new ArrayList<>());

        unsolved.add(unsolvedReference);
    }

SimonCockx avatar Oct 29 '24 11:10 SimonCockx

I might be missing something but that replace alone doesn't seem to solve the problem with your sample xsd. I'm on windows if that makes any difference. I've created a file issue_72.xsd with your example and created the following test:

imagem

Test code:

@Test
public void testIssue72() {
    XsdParser xsdParser = new XsdParser( getFilePath("issue_72.xsd"));
    XsdComplexType documentComplexType = xsdParser.getResultXsdSchemas().findFirst().get().getChildrenComplexTypes().filter(e -> e.getName().equals("Document")).collect(Collectors.toList()).stream().findFirst().get();

    XsdElement sequenceElement = documentComplexType.getChildAsSequence().getChildrenElements().collect(Collectors.toList()).stream().findFirst().get();
}

With my solution this doesn't happen. Resolving the cloned references wasn't really right, because if I first resolve references that clone elements that themselves have unsolved references in the end it won't be correctly resolved because I don't solve the unsolvedReferences of the clones.

And like you said, the solution is already pretty hardcoded with backslashes and slashes, I'd rather not have more if I can avoid them.

lcduarte avatar Oct 29 '24 13:10 lcduarte

Hmm, perhaps I minimized the example too much.

The actual schema I was working with is this one. Note the ref="animal":

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
	<!-- XML root element -->
	<xs:element name="document" type="Document" />
	<xs:complexType name="Document">
	  <xs:sequence>
	    <xs:element ref="animal"/> <!-- Note that the name of this element is determined by the actual type of animal -->
	  </xs:sequence>
	</xs:complexType>
	
	<!-- Element names: `Cow`s should be serialised as an element named `cow`, `Goat`s should be serialised as an element named `goat`. -->
	<xs:element name="animal" type="Animal" abstract="true" />
	<xs:element name="cow" type="Cow" substitutionGroup="animal" />
	<xs:element name="goat" type="Goat" substitutionGroup="animal" />
	
	<!-- Types: `Cow` and `Goat` extend from `Animal` -->
	<xs:complexType name="Animal">
	  <xs:attribute name="name" type="xs:string" />
	</xs:complexType>
	<xs:complexType name="Cow">
	  <xs:complexContent>
	    <xs:extension base="Animal" />
	  </xs:complexContent>
	</xs:complexType>
	<xs:complexType name="Goat">
	  <xs:complexContent>
	    <xs:extension base="Animal" />
	  </xs:complexContent>
	</xs:complexType>
</xs:schema>

Perhaps there are two issues then. My own issue was resolved with replace("\\", "/").

Resolving the cloned references wasn't really right, because if I first resolve references that clone elements that themselves have unsolved references in the end it won't be correctly resolved because I don't solve the unsolvedReferences of the clones.

This is not true though - I have stepped through the code. Upon cloning an UnsolvedReference, the clone is added to the unsolvedReferences, after which the next iteration in resolveInnerRefs will pick it up. See XsdParserCore#resolveInnerRefs line 468. My issue was caused by not adding the clone to the right file inside unsolvedReferences.

SimonCockx avatar Oct 29 '24 15:10 SimonCockx

This is not true though - I have stepped through the code. Upon cloning an UnsolvedReference, the clone is added to the unsolvedReferences, after which the next iteration in resolveInnerRefs will pick it up. See XsdParserCore#resolveInnerRefs line 468. My issue was caused by not adding the clone to the right file inside unsolvedReferences.

You are right, but due to the stop condition of having the same number of UnsolvedReferences after iterating the list the resolveInnerRefs will stop resolving, at least with the first example you've provided. Anyway adding the sort will always be an improvement, by ordering I'll avoid generating unnecessary UnsolvedReferences in clones which will make it faster to parse.

I can't really replicate your issue, but then again I'm on Windows. Since your suggested solution doesn't seem to break any of the existing tests I'll add it, together with the improvement I mentioned above. Would that work for you?

lcduarte avatar Oct 29 '24 15:10 lcduarte

I'm on Windows too, actually, so I'm surprised it's not possible to replicate.

But yes, I agree it might make things more efficient, and yes, if you can add that replacement in that would be great! Cheers.

SimonCockx avatar Oct 29 '24 16:10 SimonCockx

Great!

The new version is available, 1.2.16. If you find any other issues let me know.

lcduarte avatar Oct 29 '24 17:10 lcduarte