cobrapy Discussion for Group feature in COBRApy

Hi all, @draeger, @matthiaskoenig, @Midnighter, @cdiener, and other members of COBRApy. I am opening a new thread for discussing about the implementation of Group package in COBRApy as I could not find one best suited for this discussion.

Complete support for Group package has already been added in COBRApy to parse the group info to cobra models. Even if the information is present in the form of subsystems inside the notes of a reaction, that is also extracted and converted to the group structure of cobra models. However, the group information is currently not put in JSON format if cobra models are converted to JSON form, and hence this information is lost. One of the tasks of the project #137 is to add complete support of Group feature in SBML-JSON form. I went through the Group specification document to understand the group structure and here is the format I propose for it:

{
    '+++',
    "listOfGroups" : {
        "metaid": "value",
        "sboterm": "value",
        "notes": "value",
        "annotation": {"content"},
        "groups": [
            {
                "id": "value",
                "name": "value",
                "notes": "value",
                "annotation": {"content"},
                "kind": "value",
                "listOfMembers": {
                    "metaid": "value",
                    "sboterm": "value",
                    "notes": "value",
                    "annotation": {"content"},
                    "members": [
                        {
                            "id": "value",
                            "name": "value",
                            "notes": "value",
                            "annotation": {"content"},
                            "idRef": "value",
                            "metaIdRef": "value"
                        },
                        '...'
                    ]
                }
            },
            '...'
        ],
    },
    '+++'
}

Some key-value pairs shown above are optional and may not be present like those which came due to extension from SBase class and are of type optional. Also, only one out of idRef and metaIdRef attribute is to be used inside the Member class. All these things will be handled inside the method that will do the parsing.

One thing I want to point here is that all the classes under group package i.e., ListOfGroups, Group, ListOfMembers, and Member extends the SBase class and hence can possibly have the attributes of SBase class also like notes and annotations. However, in the current implementation of COBRApy, only the notes and annotations corresponding to Group class are parsed to the cobra model. No parsing of notes and annotations corresponding to ListOfGroup, ListOfMembers, and Member class is done. So theoretically speaking, I think the parsing of notes and annotations of these classes should also be implemented. The above JSON format support this, but this information first has to be parsed to cobra models. Members, can you please provide your review regarding this?

May 07 '20 15:05 Hemant27031999

Shouldn't SBO terms just be a part of annotations? How do you identify whether an element in the members list is a reaction, metabolite or group (in theory you may have one of each with the same ID in a cobra Model I think).

May 07 '20 17:05 cdiener

Shouldn't SBO terms just be a part of annotations?

I am sorry for not paying attention to that. In cobra models, SBOTerm of each component is put inside its corresponding annotation dict. So SBOTerm here also should be put inside the annotation. Thanks for clarifying that.

in theory you may have one of each with the same ID in a cobra Model

Well, I went according to the SBML specification doc which says that ID of a given component should be unique inside the complete XML document. But I just came to know that while making the cobra model from the SBML in COBRApy, this ID is first passed through replacement function and is modified a little bit like, the prefix present before each ID of a given component (R_ for reactions, S_ for species) are clipped out. So I think this is why you are saying that IDs of different elements may be same. Am I correct? If this is the case, the best thing I think we can do is to also attach the "typecode" of the referenced component inside the member dictionary so that, at the time of making the model again from the JSON, we only search in that particular field to which the "typecode" belongs. So member dictionary will also have one more key-value pair as follows:

                    {
                        ...,
                        "typecode": value,
                        ...
                    },

May 08 '20 15:05 Hemant27031999

Yep, users also might create the model directly in cobrapy and I think the uniqueness constraint does not apply there. The typecode makes sense. Might be more readable to make it a string with only a set of valid values. JSON does not have enums right?

May 08 '20 15:05 cdiener

Yep, users also might create the model directly in cobrapy and I think the uniqueness constraint does not apply there.

Okay, thanks for the clarification.

Might be more readable to make it a string with only a set of valid values.

Ya, that would surely be more verbose.

JSON does not have enums right?

We can have enums in JSON also, here it goes. We can define our enum for the type of referenced component inside the member dictionary in the JSON-schema (one of the main tasks of my project is to update the JSON schema also, so it can easily be covered) and this will ensure the presence of only those values which are defined in the schema.

May 08 '20 16:05 Hemant27031999

Just one remark about the uniqueness of identifiers in SBML: Certain IDs "live" in separate namespaces, e.g.,

UnitDefinitions. It is therefore allowable to have a matching ID of a unit definition and some other model component.
Furthermore, different rules apply with respect to scoping of objects. LocalParameter objects may have identifiers that shadow those of (global) Parameter objects. IDs only need to be unique within their respective namespace.
Similarly, if you have a hierarchically structured model (with comp package), you may have multiple modelDefinition objects within the same XML file. My understanding is that it is then allowable to have matching IDs of different objects from different model definitions.

May 11 '20 23:05 draeger

Shouldn't SBO terms just be a part of annotations?

Yes, sboTerm should just be an annotation (one can easily get these out of the annotations later on as long as there is only a single SBOTerm)

How do you identify whether an element in the members list is a reaction, metabolite or group (in theory you may have one of each with the same ID in a cobra Model I think).

These ids should be unique, but I think this is not enforced by cobrapy (but not sure about that). Non-unique ids will already create problems in creating groups, because the DictList will not work if different element have the same idea. Currently members of groups are handled via:

self._members = DictList() if members is None else DictList(members)

Probably the type of the object must be stored in addition, and the corresponding implementation of groups must be fixed.

May 21 '20 13:05 matthiaskoenig

I am pretty sure that members are cobrapy objects and not IDs in that code, so that should be fine. References are still unique even if the IDs are the same.

May 21 '20 15:05 cdiener

Thanks for additional information @draeger. It was helpful.

Inside the group, the references are made only to those components which lie in the same namespace of the model to which the Group belongs. So I don't think so that any of these is going to be a matter of worry if we have id and type of the component stored with us, that will always be unique and won't ever create a problem.

May 21 '20 19:05 Hemant27031999

Tracked in #1237 now.

Nov 04 '22 19:11 cdiener

cobrapy cobrapy copied to clipboard

Discussion for Group feature in COBRApy

cobrapy
cobrapy copied to clipboard