`cur_land_use` was an enum before LinkML
Something that I think was missed during the migration to LinkML
cur_land_use (https://genomicsstandardsconsortium.github.io/mixs/0001080/) has Range: [String](https://genomicsstandardsconsortium.github.io/mixs/String/)
However, it also has
string_serialization: '[cities|farmstead|industrial areas|roads/railroads|rock|sand|gravel|mudflats|salt flats|badlands|permanent snow or ice|saline seeps|mines/quarries|oil waste areas|small grains|row crops|vegetable crops|horticultural plants (e.g. tulips)|marshlands (grass,sedges,rushes)|tundra (mosses,lichens)|rangeland|pastureland (grasslands used for livestock grazing)|hayland|meadows (grasses,alfalfa,fescue,bromegrass,timothy)|shrub land (e.g. mesquite,sage-brush,creosote bush,shrub oak,eucalyptus)|successional shrub land (tree saplings,hazels,sumacs,chokecherry,shrub dogwoods,blackberries)|shrub crops (blueberries,nursery ornamentals,filberts)|vine crops (grapes)|conifers (e.g. pine,spruce,fir,cypress)|hardwoods (e.g. oak,hickory,elm,aspen)|intermixed hardwood and conifers|tropical (e.g. mangrove,palms)|rainforest (evergreen forest receiving >406 cm annual rainfall)|swamp (permanent or semi-permanent water body dominated by woody plants)|crop trees (nuts,fruit,christmas trees,nursery trees)]'
I suspect this should've been an enum. But an enum couldn't be made directly, as you should include examples in the enum permissible values
See NMDC's update: https://microbiomedata.github.io/nmdc-schema/CurLandUseEnum/
MIxS should also provide this enum, and provide the examples and information in () in an attribute of the enum.
FYI @turbomam
related to https://github.com/GenomicsStandardsConsortium/mixs/issues/905
https://github.com/GenomicsStandardsConsortium/mixs/issues/905#issuecomment-2821326758
Yes, the dreaded string_serialization comes from the 'Value syntax' in https://github.com/GenomicsStandardsConsortium/mixs6.2_release_candidate/blob/main/GSC-excel-harmonized-repaired/mixs_v6.xlsx.harmonized.tsv
I agree that I should be an enumeration with examples and will work on that now
See also
- #905
- #373
- #333
PS the description isn't very good either
Present state of sample site
There are many states that a site could be in beyond the what in which the land is being used
And is this supposed to be aligned with any other system?
- https://www.fao.org/geospatial/resources/detail/en/c/1024744/?utm_source=chatgpt.com
- https://land.copernicus.eu/content/corine-land-cover-nomenclature-guidelines/html/?utm_source=chatgpt.com
- https://land.copernicus.eu/en/products/corine-land-cover?utm_source=chatgpt.com
- https://www.fao.org/4/x0596e/x0596e01f.htm?utm_source=chatgpt.com
- https://www.fao.org/land-water/land/land-governance/land-resources-planning-toolbox/category/details/en/c/1036361/?utm_source=chatgpt.com
- https://www.nrcs.usda.gov/sites/default/files/2022-09/EQIP_Land_Eligibility_and_NPPH_Land_Use_Chart.pdf?utm_source=chatgpt.com
- https://www.nrcs.usda.gov/conservation-basics/natural-resource-concerns/land-use?utm_source=chatgpt.com
Is this aligned with EnvO
To what degree should MIxS capture the fact that https://en.wikipedia.org/wiki/Pinus_pinaster aka http://purl.obolibrary.org/obo/NCBITaxon_71647 is a "conifer" ? Is that word synonymous with https://en.wikipedia.org/wiki/Pinales? or https://en.wikipedia.org/wiki/Pinidae ?
See also https://en.wikipedia.org/wiki/Conifer
table of frequently used values from INSDC Biosamples
From ChatGPT
Comparison of INSDC/NCBI Biosample cur_land_use Values with Pipe-Separated List and LinkML Enumeration
Observations:
1. Direct Matches:
Some of the values in the INSDC biosample data are direct matches to categories from both the pipe-separated list and LinkML enumeration. Examples include:
row crops,pastureland (grasslands used for livestock grazing),mines/quarries,sand,crop trees (nuts, fruit, christmas trees, nursery trees),conifers (e.g. pine, spruce, fir, cypress), andmarshlands (grass, sedges, rushes).
2. Category Variations:
Some categories in the INSDC biosample data have slight variations in terminology but align with categories in the enumeration. Examples include:
agriculture,agricultural, andcropwhich could be aligned with row crops, vegetable crops, or crop trees.grass/herbaceous coverandgrasslandcould relate to meadows or pastureland.agricultural experimentandarable fieldmight fit with row crops or small grains.
3. Unaccounted Values:
There are several land use categories in the biosample data that do not appear directly in either the pipe-separated list or the LinkML enumeration. These may represent specific or unique land uses. For example:
abandoned grassland,arable cropland for long-term experimentation,no-till system,temperate coniferous forest,National Park,fertilized meadow, andunfertilized pasture grazed by cattle.
4. Overlapping Terms:
Some terms have overlapping meaning but differ slightly in phrasing:
shrub land (e.g. mesquite, sage-brush, creosote bush, shrub oak, eucalyptus)andshrub crops (blueberries, nursery ornamentals, filberts)align with shrub land and shrub crops in the enumeration, though with additional specificity in the biosample data.
Summary of Key Comparisons:
High Match:
Terms like row crops, industrial areas, conifers, hardwoods, pastureland, marshlands, vine crops, and shrub land match directly or closely with both the pipe-separated list and LinkML enumeration.
Moderate Match:
Terms such as agriculture, agricultural, and crop are commonly found in the biosample data but appear more generally in the enumeration (e.g., under row crops, vegetable crops, crop trees).
Missing/Unique:
The biosample data contains specific land use terms not covered by either list (e.g., abandoned grassland, temperate coniferous forest, fertilized meadow, no-till system, National Park).
Suggestions:
To better align with the biosample data:
- Extend the LinkML enumeration to include terms like
agriculture,arable field,fertilized meadow,unfertilized pasture, andNational Parkthat appear in the biosample counts. - Adjust terminology: Consider consolidating overlapping terms (e.g.,
grassland,grass/herbaceous cover, andmeadows) to reduce ambiguity and improve consistency across data sources. - Handle variations: Allow for aliases or synonyms in the LinkML model to capture terms like
agricultural experimentortemperate woodlandthat are used less commonly.