mixs icon indicating copy to clipboard operation
mixs copied to clipboard

define "slot_name" also known as "structured_comment_name"

Open only1chunts opened this issue 8 months ago • 9 comments

The slot name is the LinkML attribute for the GSC MIXS attribute called "Structure comment name". The structured comment name is the name of a checklist item as it will appear in GenBank structured comments.

only1chunts avatar Apr 28 '25 19:04 only1chunts

All LinkML slots have a name, even if it isn't explicitly asserted. For example, in this minimal schema:

id: http://example.com/minimal # range URI
name: minimal
default_prefix: minimal
prefixes:
  minimal: http://example.com/minimal/
slots:
  age:
    required: true
    description: the amount of time since something was created, born, etc.

is inferred to mean this

slots:
  age:
    name: age
    description: the amount of time since something was created, born, etc.
    from_schema: http://example.com/minimal
    required: true

turbomam avatar May 12 '25 17:05 turbomam

I never found a good way to describe that without getting into YAML jargon

turbomam avatar May 12 '25 17:05 turbomam

Our task is to determine which LinkML slot naming practices we are going to follow, and whether we are going to claim any more rigorous constraints as a matter of GSC/MIxS policy, or in order to better interoperate with our partner systems.

The LinkML documentation (https://linkml.io/linkml-model/latest/docs/name/) says that the range of the name metaslot is string, so theoretically any number of characters of any type could go in there. The YAML specification (https://yaml.org/spec) requires that many non-alphanumeric characters must be quoted is they are going to be used in keys names.

turbomam avatar May 12 '25 17:05 turbomam

One consideration for naming is that LinkML supports conversion of the schema and data to many different formats and serializations, and a poor choice of names can block us from using one or more of those formats, or create a serialization in which the name is represented differently from the YAML source of truth.

If a LinkML YAML file has a slot named age of thing and an attempt is made to convert it to OWL

linkml generate owl minimal.yaml 

then the slot name is silently repaired

minimal:age_of_thing a owl:ObjectProperty ;
    rdfs:label "age of thing" ;
    skos:definition "the amount of time since something was created, born, etc." ;
    skos:inScheme <http://example.com/minimal> .

turbomam avatar May 12 '25 17:05 turbomam

the LinkML linter is good a finding violations of safe naming practices, and the https://linkml.io/linkml/schemas/linter.html#standard-naming documentation page documents that the standard rule for slot names is snake_case

turbomam avatar May 12 '25 17:05 turbomam

It doesn't explicitly say that digits and punctuation shouldn't be used as the initial character, but those are definitely examples of things that would cause disconnects between the YAML source of truth and derived artifacts... possibly even in the documentation pages!

turbomam avatar May 12 '25 17:05 turbomam

I have heard it said that INSDC attribute lengths must be 20 characters or shorter, but the following documentation

https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/

is full of much longer attributes, like "biospecimen_repository_sample_id" at 32 characters

turbomam avatar May 12 '25 17:05 turbomam

We should also talk about policies around standardizing the underscore-separated tokens that make up a MIxS term name.

@mslarae13 has pointed out that there seems to be a lot of overlap between "regm" and "treat" slots

turbomam avatar May 12 '25 18:05 turbomam

  • chem_administration
  • agrochem_addition
  • chem_treatment
  • pesticide_treatment
  • antibiotic_treatment
  • food_treat_proc
  • antibiotic_regm
  • fungicide_regm
  • radiation_regm
  • rainfall_regm
  • herbicide_regm
  • pesticide_regm
  • fertilizer_regm

turbomam avatar May 12 '25 18:05 turbomam