schema-automator icon indicating copy to clipboard operation
schema-automator copied to clipboard

Support XML Schema Definition Language (XSD) import

Open multimeric opened this issue 1 year ago • 8 comments

Adds an XsdImportEngine, with tests.

XSD doesn't map cleanly to LinkML, because:

  • XML has both attributes and child elements. I treat both as slots, but tag each with a keyword that indicates which one it is
  • The root element is a pseudo class, because it may enforce a specific structure. I resolve this by adding a RootElement class where necessary

multimeric avatar Jan 03 '25 01:01 multimeric

Thank you @multimeric! I think there is a lot of interest in an XML -> LinkML importer.

sierra-moxon avatar Jan 15 '25 00:01 sierra-moxon

Thank you!!

Re lack of isomorphism. See also https://stackoverflow.com/questions/191536/converting-xml-to-json-using-python which mentions a "standard" (for instance-level). If there is nothing more up to date, I suggest being consistent with this, such that xml --[xmltodict]--> json validates via [xml schema]-->linkml

Or at least an option to do this - for now just marking this in the docs is sufficient

cmungall avatar Jan 15 '25 19:01 cmungall

Hmm that's an interesting suggestion. However the proposed solution is to prepend @ to attribute names such that:

<p id="1">text</p>

Becomes

{
  "p": {
    "@id": 1,
    "$": "text"
  }
}

I think this is a bit ugly, and also perhaps easily confused with JSON-LD, but I will implement it that way if you prefer.

I consider this scenario a bit different from standard XML to JSON conversion because here we have a schema that can describe which slots are attributes directly. So I'm wondering fi there's a good field in the LinkML SlotDefinition that would capture this (current I'm using keywords).

multimeric avatar Jan 15 '25 21:01 multimeric

And now I reflect this you are right, this would make things quite complicated. Having the @ at the schema level isn't permitted in linkml, so there would need to be a mapping preserved... all quite ugly.

So scratch that, let's keep it in mind for future extensions.

Formally the way to do this in linkml would be to use conforms_to or instantiates

slots:
  id:
    conforms_to: xsd:attribute
  description:
    conforms_to: xsd:attribute

or

slots:
  id:
    instantiates: [xsd:attribute]
  description:
    instantiates: [xsd:attribute]

The former is stronger and could be used for validation if we later implement metaclasses for xml schema (see https://linkml.io/linkml/schemas/annotations.html#validation-of-annotations).

(a to be defined xml schema metamodel):

class:
  XmlAttribute:
    is_a: slot
    class_uri: xsd:attribute
    description: If a slot instantiates this then it should be mapped to an attribute when serializing to XML

However, if you want to keep your method flexible such that you can pass in a profile on the command line (e.g. using keywords) and satisfy your use case first that's valid!

cmungall avatar Jan 16 '25 02:01 cmungall

Okay, I've gone with your second suggestion, as I like the idea that instantiates can support multiple different related schemas. Added tests to ensure that it's added correctly.

multimeric avatar Jan 17 '25 05:01 multimeric

these test failures seem unrelated to the main content of your PR, but my guess is that updating the lock file introduced some upstream library change that modifies behavior

cmungall avatar Jan 28 '25 01:01 cmungall

Ah no that's my fault, I used some type syntax that isn't compatible with Python 3.9. It should hopefully be fixed now.

multimeric avatar Jan 28 '25 02:01 multimeric

@multimeric I am starting a review of this PR. Do you have time to merge from linkml/schema-automator main and resolve the poetry.lock conflict?

I was able to do the merge on my local machine, so let me know if you'd prefer me to push that or to assist w/ poetry.lock. So far the tests seem to pass (there are a lot of errors in the logs that appear unrelated to your PR). Also I was able to run the import on a sample xsd schema.

Thanks for adding this feature.

tfliss avatar Nov 20 '25 18:11 tfliss