Support XML Schema Definition Language (XSD) import
Adds an XsdImportEngine, with tests.
XSD doesn't map cleanly to LinkML, because:
- XML has both attributes and child elements. I treat both as slots, but tag each with a keyword that indicates which one it is
- The root element is a pseudo class, because it may enforce a specific structure. I resolve this by adding a
RootElementclass where necessary
Thank you @multimeric! I think there is a lot of interest in an XML -> LinkML importer.
Thank you!!
Re lack of isomorphism. See also https://stackoverflow.com/questions/191536/converting-xml-to-json-using-python which mentions a "standard" (for instance-level). If there is nothing more up to date, I suggest being consistent with this, such that xml --[xmltodict]--> json validates via [xml schema]-->linkml
Or at least an option to do this - for now just marking this in the docs is sufficient
Hmm that's an interesting suggestion. However the proposed solution is to prepend @ to attribute names such that:
<p id="1">text</p>
Becomes
{
"p": {
"@id": 1,
"$": "text"
}
}
I think this is a bit ugly, and also perhaps easily confused with JSON-LD, but I will implement it that way if you prefer.
I consider this scenario a bit different from standard XML to JSON conversion because here we have a schema that can describe which slots are attributes directly. So I'm wondering fi there's a good field in the LinkML SlotDefinition that would capture this (current I'm using keywords).
And now I reflect this you are right, this would make things quite complicated. Having the @ at the schema level isn't permitted in linkml, so there would need to be a mapping preserved... all quite ugly.
So scratch that, let's keep it in mind for future extensions.
Formally the way to do this in linkml would be to use conforms_to or instantiates
slots:
id:
conforms_to: xsd:attribute
description:
conforms_to: xsd:attribute
or
slots:
id:
instantiates: [xsd:attribute]
description:
instantiates: [xsd:attribute]
The former is stronger and could be used for validation if we later implement metaclasses for xml schema (see https://linkml.io/linkml/schemas/annotations.html#validation-of-annotations).
(a to be defined xml schema metamodel):
class:
XmlAttribute:
is_a: slot
class_uri: xsd:attribute
description: If a slot instantiates this then it should be mapped to an attribute when serializing to XML
However, if you want to keep your method flexible such that you can pass in a profile on the command line (e.g. using keywords) and satisfy your use case first that's valid!
Okay, I've gone with your second suggestion, as I like the idea that instantiates can support multiple different related schemas. Added tests to ensure that it's added correctly.
these test failures seem unrelated to the main content of your PR, but my guess is that updating the lock file introduced some upstream library change that modifies behavior
Ah no that's my fault, I used some type syntax that isn't compatible with Python 3.9. It should hopefully be fixed now.
@multimeric I am starting a review of this PR. Do you have time to merge from linkml/schema-automator main and resolve the poetry.lock conflict?
I was able to do the merge on my local machine, so let me know if you'd prefer me to push that or to assist w/ poetry.lock. So far the tests seem to pass (there are a lot of errors in the logs that appear unrelated to your PR). Also I was able to run the import on a sample xsd schema.
Thanks for adding this feature.