cobrapy icon indicating copy to clipboard operation
cobrapy copied to clipboard

Should the cobra default bounds be changed from -1000, 1000 to -Inf, Inf?

Open Midnighter opened this issue 5 years ago • 9 comments

This is a far reaching change that I would be rather conservative about. I'm happy to probe the community but maybe the ability to change the global configuration to include infinity is enough?

@opencobra/cobrapy-core

Midnighter avatar Mar 08 '19 13:03 Midnighter

One thing to note here is that most models in the wild that use FBC will probably have bounds of -1000, 1000 explicitly defined, right?

Midnighter avatar Mar 08 '19 13:03 Midnighter

Yes, many models in the wild will have [-1000, 1000] bounds the COBRA tools use these as default bounds (cobratoolbox and cobrapy). This is probably due to some historic reason with some early (incorrect) decision to introduce arbitrary bounds to simplify implementation (but is not mathematically correct). I am not sure if the current cobra toolbox can handle [-Inf, Inf] bounds.

Such default bounds could be very problematic. If one uses default bounds to constraint all reactions there must be a check on the solution flux distribution if any of the non-exchange fluxes is 1000 or -1000 and give a warning in such cases: The flux solution is arbitrarily constraint by default bounds, make sure this is the system behavior you expect. I highly prefer to get an "unbound solution" which clearly shows the problem than to have an artificially constraint solution which I have to manually check afterwards if it is a "real" solution (which nobody does). As a consequence results could be reported incorrectly.

If I find time at some point I will run all the Bigg models (with gene deletions) and replacing all non-exchange default bounds with [-INF, INF] to see how many of the flux distributions are altered by that, i.e., artificially constraint and giving incorrect results. No idea if this is just a theoretical thing, or if such problems are wide spread in existing models.

matthiaskoenig avatar Mar 11 '19 14:03 matthiaskoenig

This is probably due to some historic reason with some early (incorrect) decision to introduce arbitrary bounds to simplify implementation (but is not mathematically correct).

I would not so much be concerned with mathematical than biological correctness. I agree that 1000 is a bit arbitrary but there may have been a good reason to choose those initially which was lost over time. For instance they are pretty close to the diffusion limit for most reactions. Diffusion rates for small molecules in most cells range from ~1 to ~100 um^2/s which limits most effective catalysis rates to about 1/s. Most large-ish molecules in a cell have uM concentrations with some in the mM and almost none in the M range. From what I have seen from our metabolome measurements I would usually not expect any concentration above 100mM. So a flux bound of 1000 mmol/gDW h is pretty close to that limit for single substrate reactions (~1/s * 100mM * 60 s/h, especially since cell volume >> dry weight). At least that's how I usually saw that. In principle I totally agree that it would be better to implement the "real" upper bound. However, I am not sure that no bound is better. It just favors even more biologically impossible solutions than the 1000 bound.

Such default bounds could be very problematic. If one uses default bounds to constraint all reactions there must be a check on the solution flux distribution if any of the non-exchange fluxes is 1000 or -1000 and give a warning in such cases: The flux solution is arbitrarily constraint by default bounds, make sure this is the system behavior you expect.

I think we can't say if the bound is based on actual knowledge or is a default. What happens if the user does a bound of 10000 arbitrarily, or a bound of 1000 based on actual measurements or calculations?

If I find time at some point I will run all the Bigg models (with gene deletions) and replacing all non-exchange default bounds with [-INF, INF] to see how many of the flux distributions are altered by that, i.e., artificially constraint and giving incorrect results. No idea if this is just a theoretical thing, or if such problems are wide spread in existing models.

As argued above that would probably give you a bunch of solutions violating thermodynamics or diffusion. The correct experiment would be to rerun big models with the correct bounds (some may have data). Essentially with completely open bounds growth rate becomes a linear function of import fluxes and most models to show that behavior with the 1000 bound, so realistic import fluxes usually constrain a decent model enough.

cdiener avatar Mar 11 '19 17:03 cdiener

~1/s * 100mM * 60 s/h should be ~1/s * 100mM * 3600 s/h = 360 000 mmole/l/h ~ 360 mmole/h/g(wet) which is in the right order with 0.1 g(dry) ~ 1 g(wet).

Essentially with completely open bounds growth rate becomes a linear function of import fluxes

Yes. The other alternative (and current way to do fba) is that growth rate becomes a function of arbitrary bounds (which have the incorrect values relative to each other, i.e. all are 1000 instead of correct ratios). The result are more interesting flux distributions which don't have anything to do with reality. Perhaps I don't understand what is happening in FBA exactly, but this seems just incorrect. So how can you have any biological conclusions from flux distributions with arbitrary constraints?

Basic example: You have very different maximal rates of ATP generation via glycolysis or complete oxidative phosphorylation via TCA cycle. If you set all bounds on glycolysis and oxidative phosphorylation to [-1000, 1000], your energy metabolism does not have anything to do with reality, because the different maximal rates are not part of the model/constraints.

As I see it the only valid biological conclusions/predictions of FBA are:

  • gene knockout simulations (growth or not growth, because they are mainly build on the stoichiometric constraints). I.e. with arbitrary bounds the only valid conclusions are "growth is possible" or "growth is not possible".

I am trying to wrap my head around that. Is this a problem which is just ignored in the FBA community?

matthiaskoenig avatar Mar 12 '19 08:03 matthiaskoenig

~1/s * 100mM * 60 s/h should be ~1/s * 100mM * 3600 s/h = 360 000 mmole/l/h ~ 360 mmole/h/g(wet) which is in the right order with 0.1 g(dry) ~ 1 g(wet).

Yeah, that was too much handwaving from my part ;)

So how can you have any biological conclusions from flux distributions with arbitrary constraints?

As you have argued yourself the strictest bounds on your model are usually the import fluxes which should be much smaller than any of those upper bounds. Since you have mass conservation (at least if you have no other source reactions) most fluxes should stay lower than that (similar to completely open bounds). You can construct some cases where that is possible even with small import fluxes (long chain of catabolic reactions), but you don't find those too often.

And for most models I have worked with you don't hit the bounds that way. Normally you have to be careful to report internal fluxes anyways since they may not be unique. If you use pFBA that again pulls your fluxes towards lower values. You can also restrict growth which has a similar effect. If you really want to draw quantitative conclusions from your model you will need "real" bounds as you have described. Common ways to get those is to do 13C labeling for some reactions, renormalize to the measured labeled compound import flux and set bounds based on [mean - 2sd, mean + 2sd] (or similar), then optimize to get the remaining fluxes. If you concentrate on exchange fluxes or growth rates in your pub this may not be necessary however (since you often have some idea of bounds for those).

I agree with you however that a model where internal reactions all hit the bounds have limited use and that open bounds might help somewhat. I just don't think it will help with having more biologically correct results since you can still get reactions violating thermodynamics or (even worse) common sense ;)

cdiener avatar Mar 13 '19 01:03 cdiener

@cdiener Thanks for the great explanation. Like you described I am not worried so much about models with realistic uptake/exchange bounds which will limit in most of the cases the flux distributions. Things could get problematic with all exchange bounds of 1000 (which you basically get when creating a model de novo). For me the important part would be that models work with [-Inf, Inf] bounds, i.e. the LP can be solved.

In addition, I LP optimization could become much faster because with inf bounds (if encoded correctly in the LP as no constraints), one has much less simplex nodes and constraints in the problem. Basically by having a genome-scale model with 1000 reactions one creates 2000 unnecessary constraints creating bounds in 1000 dimensional space (which are simplex nodes). The simplex algorithm than just jumps from node to node to find the min/max. If you have much less nodes it should be much simpler finding the solution, which could be much faster.

One does not see this in the tests because the models are so simple, but when simulating genome-scale models like Recon3D one gets a speed improvement of ~25% out of the box for optimizations with [-INF, INF] bounds.

Bounds: [-1000, 1000]
3706.664 ms	 optimize
Bounds: [-Inf, Inf]
2737.041 ms	 optimize

I could imagine if one handles the bounds correctly in the LP, i.e. not adding constraints for [-INF, INF] bounds this could further speedup optimization.

matthiaskoenig avatar Mar 13 '19 09:03 matthiaskoenig

I could imagine if one handles the bounds correctly in the LP, i.e. not adding constraints for [-INF, INF] bounds this could further speedup optimization.

It also may improve numerical accuracy especially for model with small coefficients and import fluxes.

So, yeah I don't really know what's best.

cdiener avatar Mar 13 '19 19:03 cdiener

I think we can just leave this issue open for now.

For me the important thing is that it is easy to change the default bounds to [-Inf, Inf] and that the solvers are supporting these. With the new configuration this all seems to work. So for users who want to change the default bounds there is a single entry point to do so.

matthiaskoenig avatar Mar 13 '19 20:03 matthiaskoenig

Sounds good. I have to check at some point if sampling is working fine with open bounds.

cdiener avatar Mar 13 '19 20:03 cdiener