PheKnowLator icon indicating copy to clipboard operation
PheKnowLator copied to clipboard

**DON'T MERGE -- ARCHIVE ** Build overhaul v4.0.0

Open callahantiff opened this issue 2 years ago • 1 comments


🛑 DO NOT USE THIS BRANCH OR MODIFY THIS PR -- CONTENT IS KEPT FOR NOTES 🛑


Purpose

This PR addresses several issues and overhauls many aspects of the current build, which is described in more detail below. The primary changes made impact the amount, type, and storage of metadata at both the node- and triple-level.

Issues Addressed by PR

  • #97
  • #99
  • #107
  • #114

Scripts Impacted

  • owlnets.py.
    • Updated to fix the prior bad assumption about classes and axioms built using UnionOf constructors
  • metadata.py
    • Get new functionality for processing Biolink types
  • edge_list.py
    • Get new functionality for adding Bioregistry identifiers
  • utils/data_utils.py

Data Sources/Documentation Impacted

  • edge_source_list.txt
    • Added back chemical-rna edge data
  • resource_info.txt
    • Updated metadata for many of the edges, most often in an effort to soften the initial formatting that was applied to the data (i.e., having a more liberal and inclusive build, but providing the user with the ability to enforce specific filtering choices)
  • added back information for the chemical-rna edge

Notebooks Impacted

  • OWLNETS_Example_Application.ipynb
  • Data_Preparation.ipynb

Output Impacted

  • All output files will be g-zipped in order to improve resource use

Other Updates

  • The following Wiki pages have been udated:
    • v2-Data-Sources
      • Updated to included better descriptions
    • KG Construction
      • Section describing the KG output has been updated to note that all output are g-zipped
    • OWL-NETS 2.0
      • Section describing the KG output has been updated to note that all output are g-zipped

callahantiff avatar Dec 08 '21 01:12 callahantiff