pyobo icon indicating copy to clipboard operation
pyobo copied to clipboard

Handle GTDB term deprecation and obsolescence across releases

Open jplfaria opened this issue 2 months ago • 0 comments

It's great to see that in the new release here: https://biopragmatics.github.io/obo-db-ingest/ we are at r226, which means the bioversions module worked as expected for the new release.

My current challenge is versioning and term deprecation/obsolescence. GTDB releases a new version consistently every year. I originally made my ontology for the 2024 release, and now there's been a 2025 release. The pyobo module builder caught the new release and did an update automatically, which is great since the source files for each release follow the same consistent structure.

However, I did not build any logic in the pyobo module around versioning and term deprecation/obsolescence. We have data in our system for genomes from the 2023 version (r214), the 2024 version (r220), and are now looking to update to the 2025 (r226) release. The "issue" is that I am trying to understand how best to handle the provenance of terms across versions. GTDB is popular because it provides taxonomy for genomes that are difficult to classify, but this means there are many taxonomy term changes across releases.

For context, I took a look at the NCBITaxon ontology. NCBITaxon doesn't include deprecated terms, but it seems like they have attempted to do so (https://github.com/obophenotype/ncbitaxon/pull/123). I also looked at OBO Foundry recommendation on how to obsolete a term: https://oboacademy.github.io/obook/howto/obsolete-term/#obsoletion-process-manual

Question: Does anyone have recommendations on the best approach to handle versioning and track term deprecation/obsolescence across GTDB releases in the pyobo module? I'm ready to take a stab at implementing this, but I'd appreciate some guidance.

jplfaria avatar Oct 23 '25 14:10 jplfaria