nmdc-schema icon indicating copy to clipboard operation
nmdc-schema copied to clipboard

Implement migrators for all Berkeley schema changes (meta-issue)

Open brynnz22 opened this issue 6 months ago • 12 comments

Below is a list (in the order they should be executed) of the migration PRs to account for the schema changes in Berkeley:

  1. Migrator_from_X_to_PR23: Update the values for execution_resource to match the newly created enumeration permissible values

  2. Migrator_from_X_to_PR4: Remove omics_type and create analyte_category on OmicsProcessing class and update values to match enum

    • Should occur BEFORE OmicsProcessing becomes DataGeneration
  3. Migrator_from_X_to_PR53: Move part_of values to associated_studies on OmicsProcessing and Biosample classes

    • Should occur BEFORE OmicsProcessing becomes DataGeneration.
  4. Migrator_from_X_to_PR21: Move relevant_protocols to protocol_link for studies and remove relevant_protocols slot

  5. Migrator_from_X_to_PR129: change metabolite_quantified and has_metabolite_quantification slot names

    • Should occur BEFORE MetabolomicsAnalysisActivity becomes MetabolomicsAnalysis
  6. Migrator_from_X_to_PR31: Remove used slot from WorkflowExecution subclasses

    • Needs to occur BEFORE collection name changes as it refers to old collection names
    • Neds to occur BEFORE the instrument_set migrator happens because this migrator compares the values in the 'used' slot to the values in the 'instrument_name' slot (that is removed in that migrator)
    • Needs to occur BEFORE workflow chain migrator because that migrator removes the was_informed_by slot from the WorkflowExecutions which this migrator references.
  7. Migrator_from_X_to_PR9: populate workflow_chain_set and remove was_informed_by from WorkflowExectuion subclasses and populate part_of on subclass with workflow chain id`

    • Should occur BEFORE collection name changes as this migrator refers to omics_processing_set and WorkflowExecution subclasses with "Activity" in the name.
    • Shoul occur AFTER omics_type becomes analyte_category
  8. Migrator_from_X_to_PR19_and_PR70: instrument_set update to instrument_used from instrument_name

    • Should occur BEFORE collection name changes as this migrator refers to omics_processing_set
  9. Migrator_from_X_to_PR2_and_PR24: change MongoDb collection names

    • This includes changing omics_processing_set to data_generation_set and the workflow execution set names
  10. Migrator_from_X_to_PR10: Add the type slot to every class instance (calls the new collection names.

    • This will need to happen AFTER the classes are renamed. E.g. this migration calls the data_generation_set instead of the omics_processing_set)
    • This PR was also updated to account for new inlined classes: https://github.com/microbiomedata/berkeley-schema-fy24/pull/103
  11. Migrator_from_X_to_PR3: Fix type slot to specify subclass for DataGeneration subclasses

    • This needs to happen AFTER the following migrations:
      • X_to_PR4 (changing omics_type to analyte_category and updating values to enum)
      • X_to_PR2_and_PR24 (changing collection names, esp. omics_processing_set to data_generation_set)
      • X_to_PR10 (adding type slot to all instances as this migrator will just make all type:DataGeneration and not take into account the subclasses (e.g. type:NucleotideSequencing, and type:MassSpectrometry)

brynnz22 avatar Dec 28 '23 21:12 brynnz22