linkml icon indicating copy to clipboard operation
linkml copied to clipboard

Make it possible to configure inlined multivalued strings syntax

Open matentzn opened this issue 9 months ago • 5 comments


name: Feature request about: Suggest an idea for this project title: '' labels: feature assignees: ''


What is your feature request?

I would like to be able to create schemas for TSV files which use a variety of different ways to represent inlined values. AFAIK right now, only a weird JSON list style syntax is supported, like ["a", "b"].

However, in order for us to be able to reflect much more common separators like:

  • a|b (pipe separated)
  • a;b (semicolon separated)
  • a,b (csv separated)
  • a, b (csv+space separated)

etc, we need some way to configure / document the "delimiter" or "list-syntax" or however you like to call it.

How important is this feature to you? Select from the options below:

• Important - it's a blocker and can't do work without it

Additional context

This is also related to @tfliss work on pandera generators and in general at better support of dataframe validation.

matentzn avatar Mar 24 '25 17:03 matentzn

@matentzn Yes as you say I am working out how multivalued, inline, and list work within tables in the Pandera context and whether additional configuration is needed or not. An example is consistently distinguishing between the table forms below. Comparing the MIxS csv and yaml forms looks like good practical case. There is an interaction with range classes which become nested structs when inlined in a dataframe. Similar serialization questions have come up with boolean and date casting. Is this most practical to implement as model annotations, (de)serializer configuration or as a transform (maybe different sides of a coin?).

c1,c2
A,2;3;4
B,5;6

and

c1,c2
A,2
A,3
A,4
B,5
B,6

tfliss avatar Mar 26 '25 18:03 tfliss

Is this most practical to implement as model annotations, (de)serializer configuration or as a transform (maybe different sides of a coin?).

@tfliss I am not sure.. I think I would like @sierra-moxon and @cmungall to give their intuition here..

matentzn avatar Mar 26 '25 18:03 matentzn

see also

  • https://github.com/orgs/linkml/discussions/1996

turbomam avatar Mar 28 '25 20:03 turbomam

Presumably this problem is experienced in linkml-validate or linkml-convert, but is somewhat relevant to schemasheets too.

turbomam avatar Jul 18 '25 20:07 turbomam

MIxS is a key use case for this feature. Related issues in the MIxS repo:

  • GenomicsStandardsConsortium/mixs#952 ("specify how the LinkML multivalued metaslot should be used with MIxS terms")
  • GenomicsStandardsConsortium/mixs#465 ("allow whitespace between delimiters in Value syntax patterns?")

MIxS has historically used inconsistent delimiters (|, ;, ,) in spreadsheet-based specifications without validation. The community is trying to standardize on conventions that align with LinkML's serialization behavior.

turbomam avatar Dec 09 '25 14:12 turbomam