We are misusing the string serialization slot
a string serialization of '{float} {unit}' implies that there are float and unit classes
See also LinkML issue https://github.com/linkml/linkml/issues/674
Switch to LinkML structured patterns
See also
- https://linkml.io/linkml-model/docs/string_serialization/
- https://linkml.io/linkml-model/docs/pattern/ (for native regular expressions)
- structured patterns are most similar to MIxS' Value syntaxes and are used with PatternExpressions: https://linkml.io/linkml-model/docs/PatternExpression/
Semi-related
Some string serializations are really just lists and could be re-implemented as enumerations
| slot | string_serialization |
|---|---|
| aero_struc | [plane|glider] |
| built_struc_set | [urban|rural] |
| ceil_struc | [wood frame|concrete] |
| contam_screen_input | [reads| contigs] |
| detec_type | [independent sequence (UViG)|provirus (UpViG)] |
| fireplace_type | [gas burning|wood burning] |
| heat_sys_deliv_meth | [conductive|radiant] |
| host_dependence | [facultative|obligate] |
| seq_quality_check | [none|manually edited] |
| shading_device_loc | [exterior|interior] |
| space_typ_state | [typically occupied|typically unoccupied] |
| sym_life_cycle_type | [complex life cycle | simple life cycle] |
| urine_collect_meth | [clean catch|catheter] |
| wga_amp_appr | [pcr based|mda based] |
| window_status | [closed|open] |
counts of structured pattern elements in MIxS
| string_ser | counts |
|---|---|
| {text} | 239 |
| {float} | 90 |
| {unit} | 86 |
| {[termID]} | 75 |
| {termLabel} | 74 |
| {URL} | 35 |
| {PMID} | 34 |
| {DOI} | 34 |
| {integer} | 30 |
| {Rn/start_time/end_time/duration} | 26 |
| {boolean} | 15 |
| {version} | 13 |
| {software} | 11 |
| {duration} | 11 |
| {parameters} | 9 |
| {term} | 8 |
| {timestamp} | 7 |
| {dna} | 7 |
| {PMID|DOI|URL} | 3 |
| {period} | 2 |
| {term label} | 2 |
| {NCBI taxid} | 2 |
| {rank name} | 2 |
| {database} | 2 |
| {clustering method} | 1 |
| {AF cutoff} | 1 |
| {ANI cutoff} | 1 |
| {PID} | 1 |
| {{text} | 1 |
| {day} | 1 |
| {term ID} | 1 |
| {measurement value} | 1 |
| {percentage} | 1 |
| {reference} | 1 |
| {interval} | 1 |
| {has numeric value} | 1 |
| {has unit} | 1 |
Non-alpha characters in the tokens above
Also not including whitespace
| count | char | notes |
|---|---|---|
| 2 | _ | separates words in a token's name |
| 1 | [ | literal used with term IDs, like mountain [ENVO:12345678] |
| 1 | ] | literal used with term IDs, like mountain [ENVO:12345678] |
| 38 | { | wraps token. Also, see sieving below |
| 37 | } | wraps token |
| 3 | / | delimits sub-tokens in {Rn/start_time/end_time/duration} |
| 2 | | | delimits alternative tokens for literature references |
sieving
{{text}|{float} {unit}};{float} {unit}`
literature references
{PMID|DOI|URL}
Should clarify the differences between
Canonical
- {[termID]}
- {termLabel}
Variants
- {term ID}, used only by
plant_part_maturity - {term}:
geo_loc_name,pos_cont_type,host_of_host_pheno - {term label}:
microb_start,plant_part_maturity
see also https://github.com/microbiomedata/mixs/pull/37
Is this all standardized now or is there outstanding work visa vis MIxS or LinkML?
Good quesiton, @ddooley . @turbomam , could you provide an update?
@sujaypatil96 and I are going to work on these this week for discussion next week
@sujaypatil96
- FWD:{dna};REV:{dna}
- [1st floor|2nd floor|{integer} floor|basement|lobby]
- [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized]
- {ANI cutoff};{AF cutoff};{clustering method}
- {PMID}|{DOI}|{URL}
- {boolean};[adverse event|non-compliance|lost to follow up|other-specify]
@turbomam
- [Water Injection|Dump Flood|Gas Injection|Wag Immiscible Injection|Polymer Addition|Surfactant Addition|Not Applicable|other];{timestamp}
- [active surveillance in response to an outbreak|active surveillance not initiated by an outbreak|clinical trial|cluster investigation|environmental assessment|farm sample|field trial|for cause|industry internal investigation|market sample|passive surveillance|population based studies|research|research and development] or {text}
- [attic|bathroom|closet|conference room|elevator|examining room|hallway|kitchen|mail room|private office|open office|stairwell|,restroom|lobby|vestibule|mechanical or electrical room|data center|laboratory_wet|laboratory_dry|gymnasium|natatorium|auditorium|lockers|cafe|warehouse]
- {boolean};{boolean}
- {boolean};{float} {unit}
- {boolean};{text}