nmdc-schema icon indicating copy to clipboard operation
nmdc-schema copied to clipboard

December schema refactoring super issue - suggested updates

Open turbomam opened this issue 2 years ago • 9 comments

  • [x] apply tighter range constraints in slot_usages
    • Tracked in many other issues.
  • [x] what is a minimal number of root classes, and what categorization slots are needed to differentiate them
    • MaterialEntity
    • DataEntity
    • PlannedProcess
  • [x] mixin usage... are they used outside of alternative_ids?
    • used in database slots and alternative IDs.
  • [x] how to implement code merge and data migration
  • [x] are there ways for the team to explore the schema besides looking at YAML files or pre-generated web pages?
  • Yes, image generators. Not great, not easy but can provide an update
  • [ ] overall class relationship diagram
    • dynamic, in GraphDB
    • carefully curated and synchronized hand-crafted diagram
    • Use LinkML image generators over GraphDB exploration
  • [ ] which AttributeValue classes should we keep?\
    • Removed some, not all.
    • Half done, should have some follow up work
      • Remove text value which will affect MIxS import.
  • [x] https://github.com/microbiomedata/nmdc-schema/issues/1251

turbomam avatar Oct 25 '23 16:10 turbomam

Use Case:

Refactor Biosample.

The Biosample Class is very overloaded, with:

  • Attributes that should be pushed upwards to SampleCollection or Site or a sub-class of Site ** Sample Collection - collection date, sample collection site ** Site - geolocation name, soil type, soil horizon, ecosystem attributes
  • Attributes that should be pushed downward to MaterialProcessing ** Examples - sieving, dna isolation method

Linked Issue: #1251

Branch: https://github.com/microbiomedata/monterey-schema/tree/issue-1251-refactor-biosample

Monterrey Schema Diagram: https://app.diagrams.net/#G1Ufsblf98rGzRhJMDt7ZMgosioywnGJIU

mbthornton-lbl avatar Oct 25 '23 21:10 mbthornton-lbl

Should we move this into the fork? & I haven't looked at it, but let's make sure the documentation of the fork is VERY clear that this is a model for what we plant to incrementally implement back into NMDC schema

mslarae13 avatar Oct 27 '23 16:10 mslarae13

another example of biosample being overloaded is img_identifiers, this is an analysis identifier, which should be associated w/MetagenomeAnnotationActivity, MetatranscriptomeAnnotationActivity, MagsAnalysisActivity. Currently Sujay populates these values so the ingest process would have to change

aclum avatar Oct 27 '23 16:10 aclum

I would like to take a first pass at refactoring this before we meet in Dec. I was planning to do this as a branch off of the refactoring fork that Mark made. We can use the PR mechanism to review / modify the refactored model. There will also doubtless be any number of questions on terminology / semantics etc. to bring up to the team. #1251

mbthornton-lbl avatar Oct 27 '23 18:10 mbthornton-lbl

Alicia's list

  • [x] #1123
  • [x] #1048 (overlap w/#1030 and #1027)
  • [ ] #1257
  • [ ] #1000
  • [x] #984
  • [x] #1144

aclum avatar Nov 01 '23 19:11 aclum

@aclum you're welcome to edit my initial comment in this thread, so that your checklist items will be eligible for click-to-create-issue.

turbomam avatar Nov 02 '23 15:11 turbomam

  • [x] Change omics_type from a ControlledTermValue to an Enum.

Related to this standing issue: https://github.com/microbiomedata/nmdc-schema/issues/251

anastasiyaprymolenna avatar Nov 08 '23 21:11 anastasiyaprymolenna

  • [ ] update the slot_usage descriptions for all the has_input and has_output slots so that its clear what exactly is the input/output depending on the class using the slot.

brynnz22 avatar Nov 10 '23 19:11 brynnz22

@lamccue Can you check out the issues and requests here & let me know which ones should be prioritized for the meeting? https://github.com/microbiomedata/nmdc-schema/issues/1466

mslarae13 avatar Dec 02 '23 00:12 mslarae13