mixs icon indicating copy to clipboard operation
mixs copied to clipboard

what are our options for diffing schema changes?

Open turbomam opened this issue 1 year ago • 15 comments

The web interface always says the diff is too large to render

turbomam avatar Aug 14 '24 17:08 turbomam

I've used VS Code. It provides a nice view https://vscode.one/diff-vscode/

Eric has also provided some useful guidance. We should ask him!

Or, smaller PRs? ;)

mslarae13 avatar Sep 03 '24 23:09 mslarae13

I am reviving this specifically for diffing MIxS 6.0.0 (the last major release) vs the current main (as a preview of 7.0.0)

turbomam avatar Jul 14 '25 17:07 turbomam

  • Oct 18, 2023: v6.2.0
  • Oct 9, 2023: mixs6.1.1
  • Jul 5, 2022: mixs6.1.0
  • Mar 24, 2022: mixs6.0.0
  • Feb 27, 2022: MIxS5

turbomam avatar Jul 14 '25 17:07 turbomam

6.0.0 release (modular structure):

  • Schema root: model/schema/mixs.yaml
    • Individual modules in model/schema/ for each environment/checklist
  • Short hash: 74744ee
  • Date: 2022-03-23 07:09:46 -1000

schema files:

  • agriculture.yaml
  • air.yaml
  • built_environment.yaml
  • checklists.yaml
  • core.yaml
  • food_animal_and_animal_feed.yaml
  • food_farm_environment.yaml
  • food_food_production_facility.yaml
  • food_human_foods.yaml
  • host_associated.yaml
  • human_associated.yaml
  • human_gut.yaml
  • human_oral.yaml
  • human_skin.yaml
  • human_vaginal.yaml
  • hydrocarbon_resources_cores.yaml
  • hydrocarbon_resources_fluids_swabs.yaml
  • microbial_mat_biofilm.yaml
  • miscellaneous_natural_or_artificial_environment.yaml
  • mixs.yaml
  • plant_associated.yaml
  • ranges.yaml
  • sediment.yaml
  • soil.yaml
  • symbiont_associated.yaml
  • terms.yaml
  • wastewater_sludge.yaml
  • water.yaml

merged with

poetry run linkml generate linkml \
    --format yaml \
    --no-materialize model/schema/mixs.yaml > mixs_6_0_0_merged_unmaterialized.yaml

Current main branch, in anticipation of 7.0.0 (consolidated structure):

  • Complete schema: src/mixs/schema/mixs.yaml
  • Short hash: 9a865a63b
  • Date: 2025-07-02 17:00:49 -0400

schema files

  • deprecated.yaml
  • mixs.yaml

turbomam avatar Jul 14 '25 18:07 turbomam

get lists of elements like this:

yq 'keys' mixs_6_0_0_merged_unmaterialized.yaml > mixs_6_0_0_root_keys.txt

and

yq '.classes | keys' mixs_6_0_0_merged_unmaterialized.yaml > mixs_6_0_0_classes.txt

etc.


yq 'keys' src/mixs/schema/mixs.yaml > mixs_main_root_keys.txt
yq '.classes | keys'  src/mixs/schema/mixs.yaml > mixs_main_classes.txt

turbomam avatar Jul 14 '25 18:07 turbomam

Key differences in shared scalar slots at the root of the schema:

  1. default_prefix: Changed from mixs.vocab to MIXS
  2. description: Much more detailed and comprehensive in main version
  3. id: Protocol changed from http:// to https://
  4. name: Changed from MIxS (with capitalization) to mixs (lowercase)
Field 6.0 Version Main Version
default_prefix mixs.vocab MIXS
description Minimal Information about any Sequence Standard This file contains a YAML-formatted specification of the Minimum Information about any (x) Sequence (MIxS) standard, generated using LinkML (https://linkml.io/linkml/). This file is released by the Genomic Standards Consortium (GSC; https://www.gensc.org/) for use by anyone handling data or information about biological sequences. This file is also used as an authoritative 'source of truth' to generate downstream GSC artifacts, available here: https://github.com/GenomicsStandardsConsortium/mixs/tree/main/project
id http://w3id.org/mixs https://w3id.org/mixs
name MIxS mixs

The following is limited to meaningful key differences in the presence of root scalar slots, not just differences between the computationally merged v6.0.0 file and the monolithic main file. Non-meaningful differences are ~~crossed out~~.

Lines unique to main branch (not in 6.0.0):

  • comments
    • 'slot titles that are associated with more than one slot name/SCN: host sex'
      • that means that host_sex (MIXS:0000811) and urobiom_sex (MIXS:0000862) both inherited the title 'host sex' from the mixs_v6.xls spreadsheet below
  • source
    • https://github.com/GenomicsStandardsConsortium/mixs/raw/issue-610-temp-mixs-xlsx-home/mixs/excel/mixs_v6.xls
  • version
    • v6.2.0
  • ~~imports~~
  • default_range
    • string
  • settings
    • discussed below

Lines unique to 6.0.0 (not in main):

  • ~~types~~
  • ~~source_file~~

todo:

  • classes
  • enums
  • ~~prefixes~~
  • ~~settings~~
  • slots
  • ~~subsets~~

turbomam avatar Jul 14 '25 18:07 turbomam

settings were added to the MIxS schema subsequent to version 6.0.0, to build structured_patterns that are reminiscent of the "Value specification" in the MIxS 6 spreadsheet

agrochemical_name: ".*"
amount: '[-+]?[0-9]*\.?[0-9]+'
add_recov_methods: 'Water Injection|Dump Flood|Gas Injection|Wag Immiscible Injection|Polymer Addition|Surfactant Addition|Not Applicable|other'
DOI: '^doi:10.\d{2,9}/.*$'
NCBItaxon_id: NCBITaxon:\d+
PMID: ^PMID:\d+$
URL: ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)$
adapter: '[ACGTRKSYMWBHDVN]+'
adapter_A_DNA_sequence: '[ACGTRKSYMWBHDVN]+'
adapter_B_DNA_sequence: '[ACGTRKSYMWBHDVN]+'
ambiguous_nucleotides: '[ACGTRKSYMWBHDVN]+'
boolean: '(?:yes|no)' # a non-capturing group matching either 'yes' or 'no'
country: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
date_time_stamp: '(\d{4})(-(0[1-9]|1[0-2])(-(0[1-9]|[12]\d|3[01])(T([01]\d|2[0-3]):([0-5]\d):([0-5]\d)(\.\d+)?(Z|([+-][01]\d:[0-5]\d))?)?)?)?$'
dna: '^[ACGT]+$'
duration: P(?:(?:\d+D|\d+M(?:\d+D)?|\d+Y(?:\d+M(?:\d+D)?)?)(?:T(?:\d+H(?:\d+M(?:\d+S)?)?|\d+M(?:\d+S)?|\d+S))?|T(?:\d+H(?:\d+M(?:\d+S)?)?|\d+M(?:\d+S)?|\d+S)|\d+W)
float: '[-+]?[0-9]*\.?[0-9]+'
integer: '[1-9][0-9]*'
lat: (-?((?:[0-8]?[0-9](?:\.\d{0,8})?)|90))
lon: -?[0-9]+(?:\.[0-9]{0,8})?$|^-?(1[0-7]{1,2})
name: '.*'
parameters: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
particulate_matter_name: '.*'
region: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
room_name: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
room_number: '[1-9][0-9]*'
scientific_float: '[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?'
software: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
specific_location: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
storage_condition_type: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
termID: '[a-zA-Z]{2,}:[a-zA-Z0-9]\d+'
termLabel: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
text: '.*'
unit: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)
version: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+)

⚠️ Unusually low usage patterns (≤1 use):

  • NCBItaxon_id: 1 use
  • adapter_A_DNA_sequence: 1 use
  • adapter_B_DNA_sequence: 1 use
  • add_recov_methods: 1 use
  • agrochemical_name: 1 use
  • amount: 1 use
  • boolean: 1 use
  • lat: 1 use
  • lon: 1 use
  • particulate_matter_name: 1 use

Unused settings (6 total):

  • adapter
  • country
  • dna
  • region
  • specific_location
  • storage_condition_type

Currently, the best way to use them is to validate some data against a materialized-pattern schema. The requirement to pre-materialize will be removed in future LinkML versions.

poetry run linkml generate linkml \
    --format yaml \
    --no-materialize-attributes \
    --materialize-patterns src/mixs/schema/mixs.yaml > mixs_main_materialized_patterns.yaml

then

poetry run linkml-validate \
    --schema mixs_main_materialized_patterns.yaml \
    --target-class MixsCompliantData complete_mims_soil_record.yaml

turbomam avatar Jul 14 '25 20:07 turbomam

the deprecation file was added post 6.0.0

https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/deprecated.yaml

turbomam avatar Jul 14 '25 20:07 turbomam

Key prefix differences:

  1. Removed in main version:
    • mixs.vocab prefix (was https://w3id.org/mixs/vocab/)
    • MIGS prefix (was https://w3id.org/mixs/migs/)
  2. Changed:
    • MIXS prefix reference changed from https://w3id.org/mixs/terms/ to https://w3id.org/mixs/
  3. Added in main version:
    • SO prefix pointing to http://purl.obolibrary.org/obo/SO_
  4. Unchanged:
    • linkml, xsd, shex, and schema prefixes remain the same

turbomam avatar Jul 14 '25 23:07 turbomam

Key subset differences:

Subset 6.0 Version Main Version Status
checklist A MIxS checklist. These can be combined with packages (not present) Removed in main
package A MIxS package. These can be combined with checklists (not present) Removed in main
checklist_package_combination A combination of a checklist and a package (not present) Removed in main
combination_classes (not present) (no description) Added in main
sequencing (not present) (no description) Added in main
environment (not present) (no description) Added in main
nucleic acid sequence source (not present) (no description) Added in main
investigation (not present) (no description) Added in main

Completely different organizational approach:

  1. 6.0 Version subsets (all removed):
    • checklist
    • package
    • checklist_package_combination
  2. Main Version subsets (all new):
    • combination_classes
    • sequencing
    • environment
    • nucleic acid sequence source
    • investigation

Key observations: - The 6.0 version used a structural organization (checklist + package + combination) - The main version uses a functional/thematic organization (sequencing, environment, etc.) - All subset descriptions were removed in the main version

The knowledge that a class is a Checklist or Extension is communicated with is_a now, not in_subset. The fact that something is a combination is communicated with in_subset, so that's a little inconsistent. The fact that a class is a combination can also be discovered because it is will be is_a some Extension and mixins some Checklist.

grep -A 1  in_subset mixs_main_materialized_patterns.yaml | sort | uniq -c
 253     - combination_classes
  10     - environment
   6     - investigation
  25     - nucleic acid sequence source
  57     - sequencing
grep -A 1  'is_a:'  mixs_main_materialized_patterns.yaml | sort | uniq -c
  11     is_a: Agriculture
  11     is_a: Air
  11     is_a: BuiltEnvironment
  11     is_a: Checklist
  23     is_a: Extension
  11     is_a: FoodAnimalAndAnimalFeed
  11     is_a: FoodFarmEnvironment
  11     is_a: FoodFoodProductionFacility
  11     is_a: FoodHumanFoods
  11     is_a: HostAssociated
  11     is_a: HumanAssociated
  11     is_a: HumanGut
  11     is_a: HumanOral
  11     is_a: HumanSkin
  11     is_a: HumanVaginal
  11     is_a: HydrocarbonResourcesCores
  11     is_a: HydrocarbonResourcesFluidsSwabs
  11     is_a: MicrobialMatBiofilm
  11     is_a: MiscellaneousNaturalOrArtificialEnvironment
  11     is_a: PlantAssociated
  11     is_a: Sediment
  11     is_a: Soil
  11     is_a: SymbiontAssociated
  11     is_a: WastewaterSludge
  11     is_a: Water

slot_group in not used in either version

There is active discussion about refining the way we use any one of those (is_a, in_subset or slot_group) to represent MIxS "sections', whose composition and definitions are also under discussion.

Ultimately, we should assert rich metadata for whatever structures we use to capture the sections.

turbomam avatar Jul 14 '25 23:07 turbomam

The class naming system underwent a complete transformation between 6.0 and main versions:

6.0 Version (289 classes): Used space-delimited names like "soil MIGS bacteria" and "air MIMARKS specimen"

Main Version (290 classes): Uses CamelCase names like "MigsBaSoil" and "MimarksCAir"

Also, the ordering of the combination classes was changed from {environment} {checklist} → {checklist}{environment}

New in Main:

  • Checklist, Extension, MixsCompliantData base classes

MixsCompliantData is required for validating collections of data , liek CSV files

Removed from Main:

  • core and quantity value classes

turbomam avatar Jul 15 '25 00:07 turbomam

Class Metadata Comparison: 6.0 vs Main

Key Structural Differences

Class Structure Fields

Field 6.0 Version Main Version
name
description
from_schema
aliases
mixin
slot_usage
title ✓ (added)
is_a ✓ (added)
class_uri ✓ (added)
tree_root ✓ (added to MixsCompliantData)

Inheritance Hierarchy

  • 6.0: Flat structure, no inheritance
  • Main: Hierarchical structure with is_a relationships

Same Core Descriptions

Both versions maintain identical descriptions for equivalent classes:

  • "Minimal Information about a Genome Sequence: cultured bacteria/archaea"

turbomam avatar Jul 15 '25 00:07 turbomam

The current main branch adds use case annotations on the Extension classes based on MIxS-extension-definitions. The same document was use to add titles and update the descriptions.

Class Name Use Cases
Agriculture Agricultural Microbiomes Research Coordination Network, model cropping and plant systems focused on agricultural plant and soil microbe; microbiome studies in agricultural sites; long-term ecological research in croplands; eDNA in manure samples; describing agricultural microbiome studies
Air bioaerosol samples, pathogen load in urban air, aerosols
BuiltEnvironment microbiology studies of the built environment, NASA space station sampling, MetaSUB transit system sampling, home, hospitals, office buildings
FoodAnimalAndAnimalFeed Microbiome of farm animals, their feed, and pet food.
FoodFarmEnvironment Microbiome of farm and field crops as well as environmental samples including irrigation, soil amendments, and farm equipment.
FoodFoodProductionFacility Microbiome of food production facilities/factories
FoodHumanFoods Microbiome of foods intended for human consumption.
HostAssociated elephant fecal matter or cat oral cavity
HumanAssociated blood samples or biopsy samples.
HumanGut human stool or fecal samples, or samples collected directly from the gut.
HumanOral mouth swab sampling, dental microbiome samples, microbiome of oral swabs, nasal, mouth, throat, teeth, tongue microbiome studies
HumanSkin swab samples taken on a person's skin surface.
HumanVaginal vaginal swabbing
HydrocarbonResourcesCores The microbial characterization of hydrocarbon occurrences, defined as the natural and artificial environmental features that are rich in hydrocarbons, from hydrocarbon rich formations, such as reservoir cores.
HydrocarbonResourcesFluidsSwabs The microbial characterization of hydrocarbon occurrences, defined as the natural and artificial environmental features that are rich in hydrocarbons, from hydrocarbon resource fluids.
MicrobialMatBiofilm samples from microbial mats at cold seeps
PlantAssociated plant surface swabs, root soil or rhizosphere, cultivated plants, plant phenotyping
Sediment river bed or sea floor.
Soil soil collection, island microbiome sampling, farm land or forest floor.
SymbiontAssociated the microbiome sequence of a flea sampled from a farm animal
WastewaterSludge sewerage or industrial wastewater
Water sea or river water, global ocean sampling day

turbomam avatar Jul 15 '25 00:07 turbomam

slot to class assignment differences

I don't trust the Agriculture difference report below

  1. Core checklists (MIGS, MIMS, MIMARKS, etc.) all gained the same 4 slots: alt, depth, elev, temp
  2. Environmental extensions generally lost core metadata fields (collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon) while gaining project-specific fields (project_name, samp_name, samp_vol_we_dna_ext)
  3. Single-cell specific changes in MISAG/MIUVIG where slot names were updated (e.g., single_cell_lysis_appr → sc_lysis_approach)

MIGS eukaryote → MigsEu

  • Added slots (4): alt, depth, elev, temp

MIGS bacteria → MigsBa

  • Added slots (4): alt, depth, elev, temp

MIGS plant → MigsPl

  • Added slots (4): alt, depth, elev, temp

MIGS virus → MigsVi

  • Added slots (4): alt, depth, elev, temp

MIGS org → MigsOrg

  • Added slots (4): alt, depth, elev, temp

MIMS → Mims

  • Added slots (4): alt, depth, elev, temp

MIMARKS specimen → MimarksS

  • Added slots (4): alt, depth, elev, temp

MIMARKS survey → MimarksC

  • Added slots (4): alt, depth, elev, temp

MISAG → Misag

  • Added slots (8): alt, depth, elev, sc_lysis_approach, sc_lysis_method, temp, x16s_recover, x16s_recover_software
  • Removed slots (4): single_cell_lysis_appr, single_cell_lysis_prot, x_16s_recover, x_16s_recover_software

MIMAG → Mimag

  • Added slots (6): alt, depth, elev, temp, x16s_recover, x16s_recover_software
  • Removed slots (2): x_16s_recover, x_16s_recover_software

MIUVIG → Miuvig

  • Added slots (6): alt, depth, elev, sc_lysis_approach, sc_lysis_method, temp
  • Removed slots (2): single_cell_lysis_appr, single_cell_lysis_prot

air → Air

  • Added slots (4): air_PM_concen, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (8): air_particulate_matter_concentration, collection_date, depth, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

built environment → BuiltEnvironment

  • Added slots (2): project_name, samp_name
  • Removed slots (10): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon, temp

host-associated → HostAssociated

  • Added slots (5): host_disease_stat, host_fam_rel, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (7): collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon

human-associated → HumanAssociated

  • Added slots (5): host_disease_stat, host_fam_rel, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (10): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon

human-gut → HumanGut

  • Added slots (5): host_disease_stat, host_fam_rel, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (10): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon

human-oral → HumanOral

  • Added slots (6): host_disease_stat, host_fam_rel, nose_mouth_teeth_throat_disord, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (11): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon, nose_throat_disord

human-skin → HumanSkin

  • Added slots (5): host_disease_stat, host_fam_rel, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (10): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon

human-vaginal → HumanVaginal

  • Added slots (5): host_disease_stat, host_fam_rel, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (11): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, host_family_relation, lat_lon, samp_salinity

hydrocarbon resources-cores → HydrocarbonResourcesCores

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (8): alt, collection_date, depth, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

hydrocarbon resources-fluids_swabs → HydrocarbonResourcesFluidsSwabs

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (9): alt, collection_date, depth, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

microbial mat_biofilm → MicrobialMatBiofilm

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (7): alt, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

miscellaneous natural or artificial environment → MiscellaneousNaturalOrArtificialEnvironment

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (6): collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

plant-associated → PlantAssociated

  • Added slots (4): host_disease_stat, project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (7): alt, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

sediment → Sediment

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (7): alt, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

soil → Soil

  • Added slots (4): project_name, samp_name, samp_vol_we_dna_ext, soil_texture
  • Removed slots (9): alt, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon, salinity_meth, soil_text_measure

wastewater_sludge → WastewaterSludge

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (8): alt, collection_date, elev, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

water → Water

  • Added slots (3): project_name, samp_name, samp_vol_we_dna_ext
  • Removed slots (7): alt, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, lat_lon

symbiont-associated → SymbiontAssociated

  • Added slots (9): host_fam_rel, host_infra_spec_name, host_infra_spec_rank, project_name, salinity, samp_name, samp_vol_we_dna_ext, source_mat_id, urobiom_sex
  • Removed slots (9): env_broad_scale, env_local_scale, env_medium, host_family_relationship, host_infra_specific_name, host_infra_specific_rank, host_sex, samp_salinity, sample_name

food-human foods → FoodHumanFoods

  • Added slots (11): experimental_factor, nucl_acid_ext, project_name, samp_collect_method, samp_name, samp_size, samp_store_dur, samp_store_loc, samp_store_temp, samp_vol_we_dna_ext, seq_meth
  • Removed slots (10): alt, depth, elev, env_broad_scale, env_local_scale, env_medium, samp_stor_dur, samp_stor_loc, samp_stor_temp, sample_collec_method

food-animal and animal feed → FoodAnimalAndAnimalFeed

  • Added slots (10): experimental_factor, nucl_acid_ext, project_name, samp_name, samp_size, samp_store_dur, samp_store_loc, samp_store_temp, samp_vol_we_dna_ext, seq_meth
  • Removed slots (9): alt, depth, elev, env_broad_scale, env_local_scale, env_medium, samp_stor_dur, samp_stor_loc, samp_stor_temp

food-food production facility → FoodFoodProductionFacility

  • Added slots (11): experimental_factor, nucl_acid_ext, project_name, samp_name, samp_size, samp_store_dur, samp_store_loc, samp_store_temp, samp_vol_we_dna_ext, seq_meth, subspecf_gen_lin
  • Removed slots (10): alt, depth, elev, env_broad_scale, env_local_scale, env_medium, samp_stor_dur, samp_stor_loc, samp_stor_temp, temp

food-farm environment → FoodFarmEnvironment

  • Added slots (9): host_disease_stat, nucl_acid_ext, project_name, samp_name, samp_size, samp_store_dur, samp_store_temp, samp_vol_we_dna_ext, seq_meth
  • Removed slots (8): alt, elev, env_broad_scale, env_local_scale, env_medium, salinity_meth, samp_stor_dur, samp_stor_temp

agriculture → Agriculture

  • Added slots (45): adapters, annot, assembly_name, assembly_qual, assembly_software, associated_resource, biotic_relationship, chimera_check, food_product_type, food_source, host_disease_stat, host_spec_range, isol_growth_condt, lib_layout, lib_reads_seqd, lib_screen, lib_size, lib_vector, micro_biomass_meth, mid, non_min_nutr_regm, nucl_acid_amp, nucl_acid_ext, pathogenicity, pcr_cond, pcr_primers, prev_land_use_meth, samp_mat_process, samp_size, samp_store_temp, samp_vol_we_dna_ext, seq_meth, seq_quality_check, soil_texture, soil_texture_meth, sop, source_mat_id, specific_host, target_gene, target_subfragment, tot_carb, tot_nitro_cont_meth, tot_nitro_content, tot_phosphate, trophic_level
  • Removed slots (23): Food_Product_type, Food_source, alt, assembly_quality, collection_date, env_broad_scale, env_local_scale, env_medium, geo_loc_name, horizon, lat_lon, microbial_biomass_meth, non_mineral_nutr_regm, previous_land_use_meth, samp_stor_temp, soil_depth, texture, texture_meth, tot_car, tot_n_meth, tot_nitro, tot_phos, url

turbomam avatar Jul 15 '25 01:07 turbomam

MIxS Global Slots Comparison: 6.0 vs today's main branch

These results are noisy and should be double checked

Slots on class MixsCompliantData, used for validating CSV data files, etc. (266 slots)

All have names/keys following the pattern *_data.

  • Environmental extension data slots: agriculture_data, air_data, built_environment_data, etc.
  • Checklist-environment combinations: migs_ba_agriculture_data, migs_eu_air_data, etc.
  • Checklists are abstract mixins, so they don't have aggregation slots in the MixsCompliantData class

New Content Slot

  • urobiom_sex (new field)

Slot Renamings/Normalizations

  • air particulate matter concentrationair_PM_concen
  • assembly_qualityassembly_qual
  • associated resourceassociated_resource
  • Food_Product_typefood_product_type
  • Food_sourcefood_source
  • host_family_relationhost_fam_rel
  • horizonsoil_horizon
  • host_family_relationshiphost_fam_rel
  • host_infra_specific_namehost_infra_spec_name
  • host_infra_specific_rankhost_infra_spec_rank
  • microbial_biomass_methmicro_biomass_meth
  • non_mineral_nutr_regmnon_min_nutr_regm
  • previous_land_use_methprev_land_use_meth
  • samp_collec_devicesamp_collect_device
  • samp_collec_methodsamp_collect_method
  • samp_salinitysalinity
  • samp_stor_dursamp_store_dur
  • samp_stor_locsamp_store_loc
  • samp_stor_tempsamp_store_temp
  • sample_namesamp_name ('sample_name' is still mentioned in the description)
  • single_cell_lysis_apprsc_lysis_approach
  • single_cell_lysis_protsc_lysis_method
  • soil_text_measuresoil_texture_meth
  • texture_methsoil_texture_meth
  • texturesoil_texture
  • tot_cartot_carb
  • tot_n_methtot_nitro_cont_meth
  • tot_nitrotot_nitro_content
  • tot_phostot_phosphate
  • x_16s_recover_softwarex16s_recover_software
  • x_16s_recoverx16s_recover

Removed Content Fields

  • salinity_meth
  • url (see also associated resourceassociated_resource)

Removed Organizational Fields (refactored as subsets)

  • core field
  • environment field
  • investigation field
  • mixs extension field
  • nucleic acid sequence source field

turbomam avatar Jul 15 '25 01:07 turbomam