Make it possible to configure inlined multivalued strings syntax
name: Feature request about: Suggest an idea for this project title: '' labels: feature assignees: ''
What is your feature request?
I would like to be able to create schemas for TSV files which use a variety of different ways to represent inlined values. AFAIK right now, only a weird JSON list style syntax is supported, like ["a", "b"].
However, in order for us to be able to reflect much more common separators like:
a|b(pipe separated)a;b(semicolon separated)a,b(csv separated)a, b(csv+space separated)
etc, we need some way to configure / document the "delimiter" or "list-syntax" or however you like to call it.
How important is this feature to you? Select from the options below:
• Important - it's a blocker and can't do work without it
Additional context
This is also related to @tfliss work on pandera generators and in general at better support of dataframe validation.
@matentzn Yes as you say I am working out how multivalued, inline, and list work within tables in the Pandera context and whether additional configuration is needed or not. An example is consistently distinguishing between the table forms below. Comparing the MIxS csv and yaml forms looks like good practical case. There is an interaction with range classes which become nested structs when inlined in a dataframe. Similar serialization questions have come up with boolean and date casting. Is this most practical to implement as model annotations, (de)serializer configuration or as a transform (maybe different sides of a coin?).
c1,c2
A,2;3;4
B,5;6
and
c1,c2
A,2
A,3
A,4
B,5
B,6
Is this most practical to implement as model annotations, (de)serializer configuration or as a transform (maybe different sides of a coin?).
@tfliss I am not sure.. I think I would like @sierra-moxon and @cmungall to give their intuition here..
see also
- https://github.com/orgs/linkml/discussions/1996
Presumably this problem is experienced in linkml-validate or linkml-convert, but is somewhat relevant to schemasheets too.
MIxS is a key use case for this feature. Related issues in the MIxS repo:
- GenomicsStandardsConsortium/mixs#952 ("specify how the LinkML
multivaluedmetaslot should be used with MIxS terms") - GenomicsStandardsConsortium/mixs#465 ("allow whitespace between delimiters in Value syntax patterns?")
MIxS has historically used inconsistent delimiters (|, ;, ,) in spreadsheet-based specifications without validation. The community is trying to standardize on conventions that align with LinkML's serialization behavior.