December schema refactoring super issue - suggested updates
- [x] apply tighter
rangeconstraints inslot_usages- Tracked in many other issues.
- [x] what is a minimal number of root classes, and what categorization slots are needed to differentiate them
- MaterialEntity
- DataEntity
- PlannedProcess
- [x] mixin usage... are they used outside of
alternative_ids?- used in database slots and alternative IDs.
- [x] how to implement code merge and data migration
- [x] are there ways for the team to explore the schema besides looking at YAML files or pre-generated web pages?
- Yes, image generators. Not great, not easy but can provide an update
- [ ] overall class relationship diagram
- dynamic, in GraphDB
- carefully curated and synchronized hand-crafted diagram
- Use LinkML image generators over GraphDB exploration
- [ ] which AttributeValue classes should we keep?\
- Removed some, not all.
- Half done, should have some follow up work
-
- Remove text value which will affect MIxS import.
- [x] https://github.com/microbiomedata/nmdc-schema/issues/1251
Use Case:
Refactor Biosample.
The Biosample Class is very overloaded, with:
- Attributes that should be pushed upwards to SampleCollection or Site or a sub-class of Site ** Sample Collection - collection date, sample collection site ** Site - geolocation name, soil type, soil horizon, ecosystem attributes
- Attributes that should be pushed downward to MaterialProcessing ** Examples - sieving, dna isolation method
Linked Issue: #1251
Branch: https://github.com/microbiomedata/monterey-schema/tree/issue-1251-refactor-biosample
Monterrey Schema Diagram: https://app.diagrams.net/#G1Ufsblf98rGzRhJMDt7ZMgosioywnGJIU
Should we move this into the fork? & I haven't looked at it, but let's make sure the documentation of the fork is VERY clear that this is a model for what we plant to incrementally implement back into NMDC schema
another example of biosample being overloaded is img_identifiers, this is an analysis identifier, which should be associated w/MetagenomeAnnotationActivity, MetatranscriptomeAnnotationActivity, MagsAnalysisActivity. Currently Sujay populates these values so the ingest process would have to change
I would like to take a first pass at refactoring this before we meet in Dec. I was planning to do this as a branch off of the refactoring fork that Mark made. We can use the PR mechanism to review / modify the refactored model. There will also doubtless be any number of questions on terminology / semantics etc. to bring up to the team. #1251
Alicia's list
- [x] #1123
- [x] #1048 (overlap w/#1030 and #1027)
- [ ] #1257
- [ ] #1000
- [x] #984
- [x] #1144
@aclum you're welcome to edit my initial comment in this thread, so that your checklist items will be eligible for click-to-create-issue.
- [x] Change omics_type from a ControlledTermValue to an Enum.
Related to this standing issue: https://github.com/microbiomedata/nmdc-schema/issues/251
- [ ] update the
slot_usagedescriptions for all thehas_inputandhas_outputslots so that its clear what exactly is the input/output depending on the class using the slot.
@lamccue Can you check out the issues and requests here & let me know which ones should be prioritized for the meeting? https://github.com/microbiomedata/nmdc-schema/issues/1466