mixs icon indicating copy to clipboard operation
mixs copied to clipboard

We are misusing the string serialization slot

Open turbomam opened this issue 3 years ago • 8 comments

a string serialization of '{float} {unit}' implies that there are float and unit classes

See also LinkML issue https://github.com/linkml/linkml/issues/674

Switch to LinkML structured patterns

See also

  • https://linkml.io/linkml-model/docs/string_serialization/
  • https://linkml.io/linkml-model/docs/pattern/ (for native regular expressions)
  • structured patterns are most similar to MIxS' Value syntaxes and are used with PatternExpressions: https://linkml.io/linkml-model/docs/PatternExpression/

turbomam avatar Apr 01 '22 21:04 turbomam

Semi-related

Some string serializations are really just lists and could be re-implemented as enumerations

slot string_serialization
aero_struc [plane|glider]
built_struc_set [urban|rural]
ceil_struc [wood frame|concrete]
contam_screen_input [reads| contigs]
detec_type [independent sequence (UViG)|provirus (UpViG)]
fireplace_type [gas burning|wood burning]
heat_sys_deliv_meth [conductive|radiant]
host_dependence [facultative|obligate]
seq_quality_check [none|manually edited]
shading_device_loc [exterior|interior]
space_typ_state [typically occupied|typically unoccupied]
sym_life_cycle_type [complex life cycle | simple life cycle]
urine_collect_meth [clean catch|catheter]
wga_amp_appr [pcr based|mda based]
window_status [closed|open]

turbomam avatar Apr 04 '22 16:04 turbomam

counts of structured pattern elements in MIxS

string_ser counts
{text} 239
{float} 90
{unit} 86
{[termID]} 75
{termLabel} 74
{URL} 35
{PMID} 34
{DOI} 34
{integer} 30
{Rn/start_time/end_time/duration} 26
{boolean} 15
{version} 13
{software} 11
{duration} 11
{parameters} 9
{term} 8
{timestamp} 7
{dna} 7
{PMID|DOI|URL} 3
{period} 2
{term label} 2
{NCBI taxid} 2
{rank name} 2
{database} 2
{clustering method} 1
{AF cutoff} 1
{ANI cutoff} 1
{PID} 1
{{text} 1
{day} 1
{term ID} 1
{measurement value} 1
{percentage} 1
{reference} 1
{interval} 1
{has numeric value} 1
{has unit} 1

turbomam avatar Apr 04 '22 16:04 turbomam

Non-alpha characters in the tokens above

Also not including whitespace

count char notes
2 _ separates words in a token's name
1 [ literal used with term IDs, like mountain [ENVO:12345678]
1 ] literal used with term IDs, like mountain [ENVO:12345678]
38 { wraps token. Also, see sieving below
37 } wraps token
3 / delimits sub-tokens in {Rn/start_time/end_time/duration}
2 | delimits alternative tokens for literature references

sieving

{{text}|{float} {unit}};{float} {unit}`

literature references

{PMID|DOI|URL}

turbomam avatar Apr 04 '22 17:04 turbomam

Should clarify the differences between

Canonical

  • {[termID]}
  • {termLabel}

Variants

  • {term ID}, used only by plant_part_maturity
  • {term}: geo_loc_name, pos_cont_type, host_of_host_pheno
  • {term label}: microb_start, plant_part_maturity

turbomam avatar Apr 04 '22 17:04 turbomam

see also https://github.com/microbiomedata/mixs/pull/37

turbomam avatar Jul 05 '22 17:07 turbomam

Is this all standardized now or is there outstanding work visa vis MIxS or LinkML?

ddooley avatar Dec 12 '23 21:12 ddooley

Good quesiton, @ddooley . @turbomam , could you provide an update?

ramonawalls avatar Dec 29 '23 22:12 ramonawalls

@sujaypatil96 and I are going to work on these this week for discussion next week

@sujaypatil96

  • FWD:{dna};REV:{dna}
  • [1st floor|2nd floor|{integer} floor|basement|lobby]
  • [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized]
  • {ANI cutoff};{AF cutoff};{clustering method}
  • {PMID}|{DOI}|{URL}
  • {boolean};[adverse event|non-compliance|lost to follow up|other-specify]

@turbomam

  • [Water Injection|Dump Flood|Gas Injection|Wag Immiscible Injection|Polymer Addition|Surfactant Addition|Not Applicable|other];{timestamp}
  • [active surveillance in response to an outbreak|active surveillance not initiated by an outbreak|clinical trial|cluster investigation|environmental assessment|farm sample|field trial|for cause|industry internal investigation|market sample|passive surveillance|population based studies|research|research and development] or {text}
  • [attic|bathroom|closet|conference room|elevator|examining room|hallway|kitchen|mail room|private office|open office|stairwell|,restroom|lobby|vestibule|mechanical or electrical room|data center|laboratory_wet|laboratory_dry|gymnasium|natatorium|auditorium|lockers|cafe|warehouse]
  • {boolean};{boolean}
  • {boolean};{float} {unit}
  • {boolean};{text}

turbomam avatar Feb 24 '25 18:02 turbomam