cobrapy Make a decision on the behavior of Compartments

@opencobra/cobrapy-core, @gregmedlock we need to make a design choice on the proposed groups and compartments PRs and I'd love to have everyone's input. My apologies that the issues were not very well specified.

The design choices revolve around the following issues:

Basically, both groups and compartments are containers that could allow adding/removing metabolites and reactions in bulk to/from a model. I find that desirable behavior but it is certainly not necessary, for example, it would be quite convenient to just assign a group/compartment from one model to another to quickly build up a new model. The issue here is that adding the metabolites and reactions would probably automatically be context managed but the groups and compartments themselves so far are not. That means considerably more code will have to be added and complexity increases.
The other issue relates to how groups and compartments are accessed themselves. The alternatives, at least for compartments, being that they could be added to a model and managed in a DictList. That means adding metabolites and reactions to the model would have to also manage the compartments attribute. They could also be generated on the fly, i.e., be a property that loops over metabolites to generate the compartments. The problem I see with the latter approach is that both groups and compartments are not only containers but can carry important attributes such as annotation. With on the fly generation it is not clear which compartment instance we are adding annotation to. Let me elaborate: Compartments are currently automatically instantiated from a string when a metabolite is created with a compartment value that is a string. So if you generate many metabolites that are then added to a model and the model then builds a set of compartments based on ID, which compartment are we actually annotating? This can more easily be handled in input/output (but would need further code) where compartments can be created first before parsing metabolites but works less well in cases where a user creates new metabolites before adding them to a model.

To sum up my opinions: On (2.) I have a feeling that in order to be consistent we should bite the bullet and add groups as well as compartments as fully managed DictLists just like reactions, metabolites, and genes. That means we also have to context manage them. This will be extra work but automatically solve (1.). However, I would like to resolve these PRs as soon as possible so I'm happy to go with a solution that can be implemented more quickly.

Sep 15 '18 14:09 Midnighter

Even though it may require slightly more maintenance, I favor Group, Model.groups, Compartment, and Model.compartments having behavior as similar as possible to the other objects and associated containers. Congruence across these classes makes the code base easier to wrap my head around. From a teaching perspective, a light bulb goes off in people's head when they hear that reactions/metabolites/genes share the same functions for access and addition/removal (in my experience, teaching in person) so I think it improves accessibility for both users and potential contributors. Also from the teaching perspective, users that are new to programming learn principles of object-oriented programming more quickly and more intuitively when there is consistent behavior across classes, IMO.

With increased consistency across these classes, might it be possible to move a lot of functionality up to the Object and Species to reduce complexity? Are there specific troubles with context management that this might create?

I have to admit that I avoid the context manager entirely, in practice, out of fear that something I am manipulating isn't being reset correctly. I'd be happy to implement context management for Group/groups, though.

Regarding 2., would usability suffer if users needed to handle compartments on their own, e.g. raise an error whenever any action (add reactions/metabolites) attempts to add a metabolite that is not associated with an existing compartment? In this use case, users must point metabolite.compartment to an existing Compartment (which could be associated with the model already, or added to the model upon addition of the metabolite) prior to adding to the model.

Sep 17 '18 16:09 gregmedlock

I changed the topic to "Make a decision on the behavior of Compartments". The groups feature was already implemented via #543. What is missing is an update of the compartments to first class cobra object, so that annotations and other information can be stored with the compartments. It would be great if compartments could be first class objects in future cobrapy versions.

Apr 08 '19 12:04 matthiaskoenig

I agree with adding compartments, however, it's quite hard as well. Groups have been added but are not fully supported. We miss (1) context management and (2) support for I/O other than SBML. I have some code that almost completes (2) but still needs work. And I haven't even started (1) for Groups.

The same will be true for compartments so it's mostly the complexity of the task impeding progress.

Apr 08 '19 14:04 Midnighter

I have a related use case that supports adopting compartments as a first-class entity.

We are trying to use COBRApy as much as we can within our web-based reconstruction tool. This has worked great so far – no need to reimplement I/O, support for standards, mass balance calculations, etc.

However, for GUI-directed model editing, users need to be able to add compartments before those compartments are used by any metabolites or reactions. If a user adds a compartment and then downloads their model, COBRApy will throw out the unused compartment and generate a JSON file that doesn't match the database (as of version 0.15.1). This is confusing and potentially leads to data loss (e.g. if the user wants to upload that file again after making edits).

Let me know if that makes sense, and if we can help out with sorting it out.

To recreate:

In [1]: import cobra

In [2]: m = cobra.Model()

In [3]: m.compartments['c'] = 'cytosol'

In [4]: cobra.io.to_json(m)
Out[4]: '{"metabolites": [], "reactions": [], "genes": [], "id": null, "compartments": {}, "version": "1"}'

... added reaction & chemical to model ...

In [34]: m.metabolites.ac_e.compartment = 'e'

In [35]: cobra.io.to_json(m)
Out[35]: '{"metabolites": [{"id": "ac_e", "name": "", "compartment": "e"}], "reactions": [{"id": "EX_ac_e", "name": "", "metabolites": {"ac_e": -1}, "lower_bound": 0, "upper_bound": 1000, "gene_reaction_rule": ""}], "genes": [], "id": null, "compartments": {"e": ""}, "version": "1"}'

[EDIT] Sorry, this might belong here: #652

May 13 '19 18:05 zakandrewking

cobrapy cobrapy copied to clipboard

Make a decision on the behavior of Compartments

cobrapy
cobrapy copied to clipboard