memote icon indicating copy to clipboard operation
memote copied to clipboard

Improve SBO term handling

Open ChristianLieven opened this issue 5 years ago • 10 comments

Problem description

We're really specifically checking for a few node SBO terms and not yet for the presence of any children of those nodes. Ideally, we'd have some internal representation of the ontology such that we can check if the the SBO term of an object matches a specific node term or any of its children.

We should also remove the duplicate check for SBO term annotations in our annotations tests as per this issue #447

  • [ ] Refactor tests and helper functions such they make use of the full SBO Ontology
  • [ ] Explicitly exclude SBO terms from tests using the annotation attribute.

ChristianLieven avatar Jul 12 '18 08:07 ChristianLieven

Very much agreed. I'd like to see the SBO handled in cobrapy, though.

Midnighter avatar Jul 12 '18 09:07 Midnighter

To get the discussion going about what this will look like for cobrapy and memote, how do you envision cobrapy users interacting with SBO terms? There are many cases that automatic assignment makes sense, e.g. all metabolites should get SBO:0000247 (simple chemical entity), but these still have exceptions that make automation dangerous (e.g. polymers shouldn't have SBO:0000247).

Would a set of SBO helper functions be appropriate, such as functions that flag model elements as candidates for SBO terms and then let the user add the terms manually? Or is this getting into the realm of a package dedicated to annotating cobrapy models with SBO terms?

On the memote side, it seems like cobrapy-based "checks" could clean things up, e.g. calling cobra.annotation.sbo.check_element() to see whether an SBO term is being used properly, to the extent that it can be automatically assessed. If the entirety of SBO was integrated within cobrapy, this could also be done at the time of assignment (e.g. raise a warning if a non-transport reaction is assigned to a transport-related SBO term). But, the checks within memote would still be necessary because other packages probably wouldn't implement the practice.

gregmedlock avatar Jan 24 '19 21:01 gregmedlock

I like where you're going in terms of functionality.

Making it easy for a model reconstruction pipeline to mostly do the right thing is definitely a worthy goal.

Would a set of SBO helper functions be appropriate, such as functions that flag model elements as candidates for SBO terms and then let the user add the terms manually? Or is this getting into the realm of a package dedicated to annotating cobrapy models with SBO terms?

In terms of implementation, the first step in my opinion is to find a good package to work with ontologies. There are a couple of options likely candidates are ontobio, owlready, ontospy. Then one could provide a couple of convenience functions around the ontology package. We had done some preliminary tests but discarded using any of them because none of them supports Python 2 & 3. I would start with SBO annotation being part of the main cobrapy package but that depends a bit on the desired feature set.

The last cobrapy user survey encouraged me to drop support for Python 2 which would then immediately make any of those ontology packages targets to build on. Overall, I'm afraid good SBO support is a few months down the line, though. There are a number of things to wrap up on the cobrapy side plus a big clean up of the codebase to make it Python 3 only.

Fully agree with your last paragraph. Overall it will really help if we can get as clear as possible a picture of the different use-cases involving the SBO. I wonder who else is really interested in this, it'd be great to nail down some requirements with input from different parties.

Midnighter avatar Jan 24 '19 22:01 Midnighter

It seems to me that my issue ( #721) corresponds to this discussion aswell. Would it make sense to use libSBML to access the SBO tree? If I recall correctly, there is at least a possibility with JSBML so it might be possible with libSBML aswell.

JSBML offers a function called getParentTerms() which allows to access SBO terms which are more general. So far, libSBML only seems to support checking for certain ''identities'' so whether a term is for example a Conservation Law.

famosab avatar Sep 15 '21 10:09 famosab

I commented the difference between JSBML and libSBML implementation of the SBO class on their GitHub: Implementation of SBO term trees Issue number 164 (#164). Maybe that helps.

famosab avatar Sep 15 '21 10:09 famosab

I commented the difference between JSBML and libSBML implementation of the SBO class on their GitHub: Implementation of SBO term trees Issue number 164. Maybe that helps.

Can you provide a link for that, please?

Midnighter avatar Sep 15 '21 12:09 Midnighter

Here the issue: https://github.com/sbmlteam/libsbml/issues/164

matthiaskoenig avatar Sep 15 '21 18:09 matthiaskoenig

Sounds like storing the SBO in a file distributed with cobrapy and using a package to conveniently access the ontology is the way to go for now.

Midnighter avatar Sep 15 '21 22:09 Midnighter

When storing the SBO file within a distribution of COBRApy, it is crucial to make it exchangeable. The SBO is still growing. New subterms may be added, and all of a sudden, COBRApy may raise errors when users want to check a seemingly non-existing term. Some users might not be able to upgrade their version of COBRApy for various reasons. This will be a great advantage if they have a relatively easy way of simply exchanging the SBO file distributed with COBRApy. Also, if it isn't exchangeable, a new release of COBRApy could become necessary with each new version of SBO. Nice could also be an automated download of the latest SBO file (OWL format) upon building COBRApy. We have implemented that in the build process of JSBML also. Hence, each new JSBML build will automatically include the latest SBO file at the time of building. Please also note that there are already implementations of ontology parsers available in Python. Perhaps, it might not even be such a considerable effort to support the directed acyclic SBO graph queries.

draeger avatar Sep 16 '21 20:09 draeger

When storing the SBO file within a distribution of COBRApy, it is crucial to make it exchangeable. The SBO is still growing. New subterms may be added, and all of a sudden, COBRApy may raise errors when users want to check a seemingly non-existing term. Some users might not be able to upgrade their version of COBRApy for various reasons. This will be a great advantage if they have a relatively easy way of simply exchanging the SBO file distributed with COBRApy. Also, if it isn't exchangeable, a new release of COBRApy could become necessary with each new version of SBO.

Those are great points. With eQuilibrator, we've put data files on Zenodo and then load/update by DOI. That's been a pretty decent experience so far. Added bonus is that one could say model was created with SBO version x and DOI. It'd actually be great if the SBO was released directly to Zenodo rather than us doing it.

Please also note that there are already implementations of ontology parsers available in Python. Perhaps, it might not even be such a considerable effort to support the directed acyclic SBO graph queries.

Agree completely. I noted a few options above.

Midnighter avatar Sep 16 '21 22:09 Midnighter