mixs icon indicating copy to clipboard operation
mixs copied to clipboard

specify how the LinkML `multivalued` metaslot should be used with MIxS terms

Open turbomam opened this issue 8 months ago • 8 comments

In the TWG hackathon today, @jfy133, I and others chatted about how to provide examples for multivalued slots

turbomam avatar May 12 '25 20:05 turbomam

Working backwards, here is the canonical way to represent two MIxS HumanGut records in a YAML data file (which most MIxS users would never do)

including an illustration of how to use the multivalued special_diet term

human_gut_data:
  - samp_name: sample1
    project_name: project1
    special_diet:
      - low carb
      - vegetarian
  - samp_name: sample2
    project_name: project1
    special_diet:
      - low carb
      - reduced calorie

turbomam avatar May 12 '25 20:05 turbomam

The result when flattening to a TSV with linkml convert

https://github.com/GenomicsStandardsConsortium/mixs/pull/943#discussion_r2085336856

jfy133 avatar May 12 '25 20:05 jfy133

linkml convert \
    --schema src/mixs/schema/mixs.yaml \
    --target-class MixsCompliantData \
    --index-slot human_gut_data \
    --output MixsCompliantData-HumanGut-special_diet-1.csv MixsCompliantData-HumanGut-special_diet-1.yaml 

turbomam avatar May 12 '25 20:05 turbomam

takes a few minutes

turbomam avatar May 12 '25 20:05 turbomam

samp_name project_name special_diet
sample1 project1 [low carb|vegetarian]
sample2 project1 [low carb|reduced calorie]

If we wanted to make our examples look exactly like what LinkML currently expects from the CSV/TSV sterilization

examples:
    - value: '[low carb|vegetarian]'
    - value: '[low carb|reduced calorie]'

turbomam avatar May 12 '25 20:05 turbomam

Feedback/LinkML news from @cmungall

Don't lock a single serialization into the examples, especially when LinkML's serialization/de-serialization rules for multiple values in CSV/TSV could change soon

Instead, switch from using the value metaslot to the https://linkml.io/linkml-model/latest/docs/value_object/ metaslot, which can take a list abstraction.

I think that would look like

examples:
    - value_object:
        - low carb
        - vegetarian
    - value_object:
        - low carb
        - reduced calorie

turbomam avatar May 12 '25 21:05 turbomam

To clarify, let's say multivalued MIxS slots must have example of both single and multi-value examples it would look like:

examples:
   - value
        - low carb 
   - value_object:
        - low carb
        - vegetarian
    - value_object:
        - low carb
        - reduced calorie

So first example is a single value (even if multi-valued is a allowed), and the second and third is if you have multiple values for the slots.

I guess the question then is how does this get rendered for a user to know how to fill in that term e.g. when submitting data to the ENA etc?

jfy133 avatar May 13 '25 07:05 jfy133

Following on from Montana and I MISIP-MIMS review discussions:

  • Need to standardise use of "|" and ";'. Montana's thoughts about using "|" as an OR and ";" makes sense to me.
  • In this case then the example of the legacy way would be:
    • examples:
      • value: '[low carb;vegetarian]'
      • value: '[low carb;reduced calorie]'
  • IMHO the list examples from Mark and then enhanced by James, are even clearer and less error prone.
    • From a biased ENA perspective, we prefer to essentially duplicate the field_name if there are multiple applicable values e.g. project_name in some checklists.
    • (we don't allow multiple values for any field, unless users/brokers request it)

Woolly-at-EBI avatar Sep 30 '25 14:09 Woolly-at-EBI

Related: linkml/linkml#2581 ("Make it possible to configure inlined multivalued strings syntax") - this is the upstream LinkML issue for configuring delimiters like | vs ; when serializing multivalued slots to TSV/CSV.

Also related: #465 (delimiter conventions in Value syntax patterns)

turbomam avatar Dec 09 '25 14:12 turbomam