specify how the LinkML `multivalued` metaslot should be used with MIxS terms
In the TWG hackathon today, @jfy133, I and others chatted about how to provide examples for multivalued slots
Working backwards, here is the canonical way to represent two MIxS HumanGut records in a YAML data file (which most MIxS users would never do)
including an illustration of how to use the multivalued special_diet term
human_gut_data:
- samp_name: sample1
project_name: project1
special_diet:
- low carb
- vegetarian
- samp_name: sample2
project_name: project1
special_diet:
- low carb
- reduced calorie
The result when flattening to a TSV with linkml convert
https://github.com/GenomicsStandardsConsortium/mixs/pull/943#discussion_r2085336856
linkml convert \
--schema src/mixs/schema/mixs.yaml \
--target-class MixsCompliantData \
--index-slot human_gut_data \
--output MixsCompliantData-HumanGut-special_diet-1.csv MixsCompliantData-HumanGut-special_diet-1.yaml
takes a few minutes
| samp_name | project_name | special_diet |
|---|---|---|
| sample1 | project1 | [low carb|vegetarian] |
| sample2 | project1 | [low carb|reduced calorie] |
If we wanted to make our examples look exactly like what LinkML currently expects from the CSV/TSV sterilization
examples:
- value: '[low carb|vegetarian]'
- value: '[low carb|reduced calorie]'
Feedback/LinkML news from @cmungall
Don't lock a single serialization into the examples, especially when LinkML's serialization/de-serialization rules for multiple values in CSV/TSV could change soon
Instead, switch from using the value metaslot to the https://linkml.io/linkml-model/latest/docs/value_object/ metaslot, which can take a list abstraction.
I think that would look like
examples:
- value_object:
- low carb
- vegetarian
- value_object:
- low carb
- reduced calorie
To clarify, let's say multivalued MIxS slots must have example of both single and multi-value examples it would look like:
examples:
- value
- low carb
- value_object:
- low carb
- vegetarian
- value_object:
- low carb
- reduced calorie
So first example is a single value (even if multi-valued is a allowed), and the second and third is if you have multiple values for the slots.
I guess the question then is how does this get rendered for a user to know how to fill in that term e.g. when submitting data to the ENA etc?
Following on from Montana and I MISIP-MIMS review discussions:
- Need to standardise use of "|" and ";'. Montana's thoughts about using "|" as an OR and ";" makes sense to me.
- In this case then the example of the legacy way would be:
- examples:
- value: '[low carb;vegetarian]'
- value: '[low carb;reduced calorie]'
- examples:
- IMHO the list examples from Mark and then enhanced by James, are even clearer and less error prone.
- From a biased ENA perspective, we prefer to essentially duplicate the field_name if there are multiple applicable values e.g. project_name in some checklists.
- (we don't allow multiple values for any field, unless users/brokers request it)
Related: linkml/linkml#2581 ("Make it possible to configure inlined multivalued strings syntax") - this is the upstream LinkML issue for configuring delimiters like | vs ; when serializing multivalued slots to TSV/CSV.
Also related: #465 (delimiter conventions in Value syntax patterns)