cobrapy
cobrapy copied to clipboard
Elementary Flux Modes and Minimum Cut Sets
So I'll use this issue to kick off the discussion on EFMs and MCSs
The two methods I've had the most luck with in calculating EFMs and MCSs are efmtool
, doi:10.1093/bioinformatics/btn401, and mhsCalculator
, doi:10.1186/1471-2105-14-318.
efmtool
is distributed under the Simplified BSD Style License in two formats, a compiled matlab extension (which includes java .jar files which can be run from the command line), and the java source code. It requires java 1.7. Also worth noting are the more recent modifications of efmtool
by Christian Jungreuthmayer, including tEFMA and [regEfmtool)[http://www.biotec.boku.ac.at/en/arbeitsgruppenresearch-groups/research-group-mattanovich-gasser-sauer/staff/associated-group-metabolic-modelling/regefmtool/].
Is anyone on the cobrapy team familiar at all with java? (I'm not 😄 ) My current solution is a fairly quick and dirty wrapper that writes input files from a cobrapy model to a temporary directory, calls the compiled java code, and reads out the output files into a pandas dataframe. Its reliable enough for what I've used it for, but I'm not sure how that would best fit into the general cobrapy ecosystem.
mhsCalculator
is a set of C codes which function similarly to efmtool
, processing input reaction + stoichiometry files and outputting the minimum cut sets. This is also under the Simplified BSD License.
The wrapper functions seem to work fine, but I think the issues will come down to either providing install directions for the third-party tools or appropriately packaging the different code libraries. (And whether or not this is worth including in the core library or as a sub-library)
My $.02 as a former developer is that it's much easier to have c dependencies.
You can make an extension in cython and link directly against the C API, and bundle the whole thing as an independent wheel with no additional dependencies.
On Oct 16, 2016 4:39 PM, "Peter St. John" [email protected] wrote:
So I'll use this issue to kick off the discussion on EFMs and MCSs
The two methods I've had the most luck with in calculating EFMs and MCSs are efmtool http://www.csb.ethz.ch/tools/software/efmtool.html, doi:10.1093/bioinformatics/btn401 http://dx.doi.org/10.1093/bioinformatics/btn401, and mhsCalculator http://www.biotec.boku.ac.at/en/arbeitsgruppenresearch-groups/research-group-mattanovich-gasser-sauer/staff/associated-group-metabolic-modelling/mhscalculator/, doi:10.1186/1471-2105-14-318 http://www.biomedcentral.com/1471-2105/14/318.
efmtool is distributed under the Simplified BSD Style License in two formats, a compiled matlab extension (which includes java .jar files which can be run from the command line), and the java source code. It requires java 1.7. Also worth noting are the more recent modifications of efmtool by Christian Jungreuthmayer, including tEFMA https://github.com/mpgerstl/tEFMA and [regEfmtool)[http://www. biotec.boku.ac.at/en/arbeitsgruppenresearch-groups/ research-group-mattanovich-gasser-sauer/staff/associated- group-metabolic-modelling/regefmtool/].
Is anyone on the cobrapy team familiar at all with java? (I'm not 😄 ) My current solution is a fairly quick and dirty wrapper that writes input files from a cobrapy model to a temporary directory, calls the compiled java code, and reads out the output files into a pandas dataframe. Its reliable enough for what I've used it for, but I'm not sure how that would best fit into the general cobrapy ecosystem.
mhsCalculator is a set of C codes which function similarly to efmtool, processing input reaction + stoichiometry files and outputting the minimum cut sets. This is also under the Simplified BSD License.
The wrapper functions seem to work fine, but I think the issues will come down to either providing install directions for the third-party tools or appropriately packaging the different code libraries. (And whether or not this is worth including in the core library or as a sub-library)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/293, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUPSIEJ8ZZ6mmhLdtE2vK9mreXERg2Yks5q0rXHgaJpZM4KYH6S .
That was my thought as well. Sadly though it seems like efmtool and its various offshoots are all written in java. To do the minimum cut set calculation, you'll need to start with the calculated EFMs, so just including mhsCalculator alone doesn't add much functionality.
To get extreme rays of polytopes there is also the classic ccd which has python bindings. In C there is also lrs. Both use the double description method as efmtool but do not use the bit vector optimization. When I tried the command line versions ccd+ was actually faster than lrs for me. However, both take a long time even for small models (60 reversible reactions, 80.000 basis vectors, 1-2 hours)...
Many of those packages are compatible with Fukuda's original input format (no idea about efmtool), so we could also just write a function that outputs the problem definition and the user just uses whatever he likes...
Another way that efmtool seems to speed up the calculation is by a reversible compression of the metabolic network to the simplest possible representation. That might also have utility in other cobrapy functions, so that could be something worth looking into implementing
~~Does that mean deleting redundancies (linearly dependent rows) in the H-representation? I think all of the methods do/recommend that.~~ Nevermind, just skimmed the paper :) Something we could also think about is to use flux sampling and than calculate a convex hull for the sample which gives you an approximate basis. Scipy already has a convex hull function so we would not need additional dependencies. However, I have no idea how good this basis would be.
Having used scipy's convex hull function, I'd probably avoid that method 😄 . It definitely struggles with more than just a few dimensions. There's also a lot of research in this area, I'd shy away from trying to come up with a new method. ccd
and lrs
seem promising, I'll look into them more.
I do think the network expansion / compression would be an important tool, I might create a new issue to ask about it.
I'll use this issue to keep track of some alternative algorithms and implementations:
- Metatool &
elmo-comp
, for some reason I remember seeing this before and not wanting to use it... - A depth-first search algorithm to compute elementary flux modes by linear programming 10.1186/s12918-014-0094-2
- TreeEFM: calculating elementary flux modes using linear optimization in a tree-based algorithm 10.1093/bioinformatics/btu733
Yeah I have encountered the same problem with convex hulls. Sometimes reducing dimensions by SVD and reprojecting after calculating the hull helps. Either way I would not really implement it explicitly since it's just plugging the samples into the convex hull function of scipy. Maybe as an example in the docs...
So one option here could be a docker container.. that would obviously require a separate package, but it might be a relatively easy way of shipping a java 7 runtime for EFMtool and a compiled MCScalculator package. Would anyone find that useful? Or is the docker dependency a nonstarter?
Ok, we already have functions for calculating shortest EFMs and cut sets here https://github.com/biosustain/cameo/blob/devel/cameo/flux_analysis/structural.py that do not have any special dependencies. Two reasons why I think this should go in a separate package: (1) Enumerating all EFMs has no practical value for genome-scale models and I think cobrapy is mainly intended for genome-scale model. Furthermore, everything that has dependencies that go beyond C extensions should not go into cobrapy. Why not make a separate package?
Do you happen to have a good reference for the shortest EFMs and MCSs? It would be cool to compare and see whats left out for core-carbon models.
@pstjohn http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003378
Cool, I had actually just read through that. Definitely seems like a promising approach, even for smaller models. We should do another cobrapy hangout to talk about the optlang interface, it would help me figure out what you guys were doing in cameo/cameo/flux_analysis/structural.py
The paper on shortest EFMs is available here. The initial code was written in Java and needs rewriting. I'm happy to help.
Newer methods to calculate EFMs in genome-scale networks involved sampling of EFM either using genetic algorithms or a tree like search, e.g. CASOP-GS or TreeEFM. In order to use them in genome-scale networks one can formulate the EFM calculation problem as an LP.
I am being a bit bias here, but I do see value in enumerating not all but a subset of EFMs for genome-scale models, for example, for data integration. I am sure there are other areas where the enumeration of EFMs can be applied to.
Hey Luis (what a nice surprise!), sorry for not providing the original publication earlier. We actually have a shortest EFM implementation here in python https://github.com/biosustain/cameo/blob/devel/cameo/flux_analysis/structural.py.
Hey Luis, thanks for commenting!
I looked into TreeEFM, seemed like a neat tool! Its written in C++ too, so that could be fairly easy to incorporate. Do you know if the source code might be available? -- the original publication only distributes binaries.
@phantomas1234, I had a tough time figuring out how you implemented the shortest EFM algorithm, or how easy it would be to port to the optlang interface. Do you know how much of a speed detriment there is for the python over a java/c++ approach? I know the original algorithm made pretty heavy use of CPLEX's populate
, does your version work with other solvers too? (and if so, do we loose the benefit of populate
?)
@phantomas1234 Nice implementation of the K-shortest! Another approach we explored was to use a more restrictive exclusion_constraint by bounding it by len(exclusion_list) - w, where w can be any integer higher than 1. The higher w is the more distinct (containing less reactions in common) the newly calculated EFMs are.
@pstjohn Fell free to contact Francis Planes at CEIT, I am sure he will be happy to collaborate.
The k-shortest variations seem very cool. That is definitely something we should consider!
@pstjohn The shortest EFM implementation in cameo is already fully compatible with optlang and can be used with any MILP solver. In addition it implements cplex-specific subroutines that make use of the populate
function to improve performance. This performance gain will indeed be lost with solvers that don't have a feature similar to populate
.
A very large majority of the running time of shortest EFM is spent solving MILP's (optimized solver functions in C) so I don't think there will be a noticeable difference between a python and java/c++ implementation.
@pstjohn can we close this leaving EFM to the cameo implementation / other packages? Or was there still something that you think should go in cobrapy?
I still think it would be great if we could port it to cobrapy eventually, its a feature I certainly would use frequently.
Maybe not as urgent as the 0.6.0 merge, probably better to let the optlang interface in cobrapy stabilize a bit.
I would support moving the shortest EFM/minimal cut sets to cobrapy eventually.
It would also be awesome to add elementary flux vectors at some point as well, in particular to be able to integrate them with arbitrary optlang constraints: S. Klamt et al., PLoS Comput Biol. 13, e1005409–22 (2017)
It would be great if we could get people from the EFM crowd involved in cobrapy development.
Agreed! Although for EFVs specifically, its actually fairly easy to implement: I have it more or less working in my pyefm package. For cobrapy, we would probably want to do the k-shortest method... but we also need to implement some sort of network compression / decompression as well.
So for the k-shortest method, the original authors use integer fluxes only and mention that the method is more expensive using non-integer fluxes.
Do we have an opinion as to whether or not we want the fluxes to be allowed to take non-integer values? I don't have an immediate insight into whether or not that makes EFV or MCS calculation harder or easier..
Will depend on #612. WIP at feat/elementary_flux_modes
I wonder though if EFMs and related plus the tooling surrounding it would be a good target for a small cobrapy dependent (sub-) package?
There's already pstjohn/pyefm :). But i think if we had a k-shortest implementation using pure optlang it would fit nicely in the existing package.
Certainly open to discussion though
@pstjohn Requiring fluxes to be integers only works when all the coefficients of the S-matrix are integers. Since this is usually not the case for genome-scale models I think it makes most sense to allow non-integer fluxes by default. If it is significantly faster to have a pure integer problem, then I guess it would be fairly easy to add a switch argument to the function/object or automatically check if the S-matrix is integer
Closing because that discussion is 5 years old. Feel free to open a thread in the Discussion section. Either way it seems that any solution (part of the core, or separate packages) seem to have support, though it's more a lack of contributors willing to implement it.