bigg_models
bigg_models copied to clipboard
How is the Recon3D model related to the publication model?
The BiGG Recon3D model has
Metabolites | 5835 Reactions | 10600 Genes | 2248
whereas the paper reported:
3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures.
What are the real numbers for RECON3D? How is the BiGG RECON3D different from the published RECON3D?
In addition to the points mentioned, there seems to be also an issue with the defined groups. The published model has 111 groups, while the BiGG version has only 103. Also the IDs are changed; while in the published one the IDs are groupx
, in the BiGG version they are called gx
. The bigger issue though is that the numbering is changed and group members seem to have created that did not exist before or they have disappeared. To give one example:
group1
in the published model looks like this:
['R_AGTim',
'R_AGTix',
'R_ARGSS',
'R_ASNNm',
'R_ASNS1',
'R_ASPNATm',
'R_ASPTAm',
'R_DASPO1p',
'R_NACASPAH',
'R_RE1473C',
'R_RE2031M',
'R_RE2642C',
'R_ALAR',
'R_ASPTA',
'R_r0127',
'R_ARGSL']
If I then check g1
in the BiGG model it looks completely different, so I checked in which group the reaction 'R_ASPNATm'
is in the BiGG model: I find it located in g47
which looks as follows:
['R_AGTim',
'R_AGTix',
'R_ARGSS',
'R_ASNNm',
'R_ASNS1',
'R_ASPNATm',
'R_ASPTAm',
'R_DASPO1p',
'R_NACASPAH',
'R_ALAR',
'R_ASPTA',
'R_ASNN',
'R_ARGSL']
So the reactions {'R_RE1473C', 'R_RE2031M', 'R_RE2642C', 'R_r0127'}
are missing in the group, while there is an additional reaction {'R_ASNN'}
. All those four missing reactions are not included in the BiGG model. Interestingly, the additional reaction in the group is not included in the published model.
That points to a deeper issue: There are 5295 reaction IDs in the published model which are not in the BiGG version, and 2352 reaction IDs in the BiGG model not present in the published one, so, as @matthiaskoenig mentioned in a different issue, there seems to be a parsing issue.
Just see that R_ASNN
's old identifier is r0127, so this mapping went well; question would then still be what happened to the remaining 3 reactions.
Thanks for helping us look at this. We did not have a lot of lead time with the model to work these issues out, so your feedback is invaluable.
I have to spend more time looking at the model, but I see right off the bat that r0127 was matched to an existing bigg reaction ASNN, so that explains one change: http://bigg.ucsd.edu/models/Recon3D/reactions/ASNN
We are basing our version on the file here, which i received from the authors of the paper:
https://github.com/SBRG/bigg_models_data/blob/master/models/Recon3D.mat
In that file, the group you are talking about looks like this:
In [12]: [x for x in m.reactions if x.subsystem == 'Alanine and aspartate metabolism']
Out[12]:
[<Reaction AGTim at 0x123ef7b38>,
<Reaction AGTix at 0x123f022b0>,
<Reaction ARGSS at 0x123f50cc0>,
<Reaction ASNNm at 0x123f5bba8>,
<Reaction ASNS1 at 0x123f5bcc0>,
<Reaction ASPNATm at 0x123f68898>,
<Reaction ASPTAm at 0x123f68c18>,
<Reaction DASPO1p at 0x124050390>,
<Reaction NACASPAH at 0x1243f83c8>,
<Reaction ALAR at 0x125147e10>,
<Reaction ASPTA at 0x125153278>,
<Reaction r0127 at 0x126cada90>,
<Reaction ARGSL at 0x126ce51d0>]
At least when I read it with COBRApy.
sorry @matthiaskoenig that we're getting off track from the original question :) We'll get to it
@zakandrewking: Ok, I used the sbml from here, so maybe there is a discrepancy between it and the mat file; the sbml contains 13543 reactions which is the number stated in the original post.
Right. They also make a distinction between the model (which we provide) and the larger knowledge base
@matthiaskoenig I think this answers the question. Our version is the "model". You can find both the model and the larger knowledge base in the downloads section at http://vmh.life/
So what is the definitive source of Recon3D? It seems like already a few weeks after the publication there are circulating 5 different versions of the model/knowledge base (supplement SBML file, vmh SBML file, BiGG SBML file, SBRG SBML file). This is the problem if you don't have one community repository which manages the latest version ! In addition the mat file properly is different to the SBML.
Can we just agree for now that the definitive version is the latest version hosted on vmh for now until a proper repository for RECON is hopefully established at some point. So can we get the the RECON3D-v3.01 in BiGG?
So do I understand correctly, that their are 2 "mat" file hosted on vmh corresponding to the knowledgebase and the model, and one SBML which is the "model"? http://vmh.uni.lu/#downloadview
It would be very important for me to have an SBML with the information of the knowledgebase and the model on BiGG with all the annotations (especially ENSG and UniProt, which are used in key parts of the analysis of the paper). Unfortunately, I don't have Matlab licenses (nor money for them), so I can't look at the mat files and these are completely useless for me. So I am stuck with an SBML containing only part of the knowledgebase and missing most information of RECON3D.
@zakandrewking yes, I can find both on model and knowledgebase on vmh. But both are in a commercial binary format not useable
Unfortunately, I don't have Matlab licenses (nor money for them), so I can't look at the mat files and these are completely useless for me.
@matthiaskoenig I haven't tried it myself yet but COBRApy apparently supports the import of mat files. Perhaps that helps you! https://cobrapy.readthedocs.io/en/stable/io.html#MATLAB
Things start to make more sense already. I could open the mat files with octave and most of the information seems to be there (with exception of ENSG and UniProts which were only published in the RECON3D supplements). Just have to write some code to bring this in a normal text format like JSON/YAML.
I would suggest adding the 2 SBML files corresponding to "Recon3D-v3.01.mat" and "Recon3DModel-v3.01.mat" to BiGG with
Recon3DModel-v3.01.mat:
- reactions: 10600
- species: 5835
- genes: 2248
Recon3D-v3.01.mat:
- reactions: 13543
- species: 8399
- genes: 3697
Species, reactions and genes require annotations to the original "ids" used in the mat files, i.e. it must be obvious which BiGG identifiers map to the identifiers used in the mat files (this seem to be the old identifiers in BiGG, but these must be exported in the SBML).
VMH will continue to be the definitive source. I can add a disclaimer to the Recon3D page on BiGG to that effect
Adding old identifiers to BiGG SBML also makes sense if they are not already there On Wed, Feb 28, 2018 at 1:19 AM Matthias König [email protected] wrote:
Things start to make more sense already. I could open the mat files with octave and most of the information seems to be there (with exception of ENSG and UniProts which were only published in the RECON3D supplements). Just have to write some code to bring this in a normal text format like JSON/YAML.
I would suggest adding the 2 SBML files corresponding to "Recon3D-v3.01.mat" and "Recon3DModel-v3.01.mat" to BiGG with
Recon3DModel-v3.01.mat:
- reactions: 10600
- species: 5835
- genes: 2248
Recon3DModel-v3.01.mat:
- reactions: 13543
- species: 8399
- genes: 3697
Species, reactions and genes require annotations to the original "ids" used in the mat files, i.e. it must be obvious which BiGG identifiers map to the identifiers used in the mat files (this seem to be the old identifiers in BiGG, but these must be exported in the SBML).
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369172497, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYAsCdcKXC3UsNvoICCvgoGAVj4YNks5tZRoZgaJpZM4SVc_O .
@matthiaskoenig Mat-files can be parsed using dedicated libraries and are therefore accessible without MATLAB. ModelPolisher parses these mat files and converts them to SBML, optionally also annotating them using BiGG Models database.
@draeger thanks for the help. Was not aware that there is such good support for mat files outside of matlab. But I still don't understand why all the gene information is not part of the mat/SBML files (i.e. the information provided in Supplement S4, Supplemental Data File of Recon3D). There are links to Ensembl, MIM, UniProt, GO, WikiGene Id which are all lacking from the mat files and the SBML. Basically the species and reactions are annotated well, whereas all the gene information is lacking completely.
The SBML files provided by BiGG reflect the information in our database, so when we import models with extra annotations, it takes us some time to upgrade the database, APIs, and web pages to include this info. This just takes some time, and in the meantime I would recommend that users look to the other available Recon3D files at VMH.
On Wed, Feb 28, 2018 at 11:26 AM Matthias König [email protected] wrote:
@draeger https://github.com/draeger thanks for the help. Was not aware that there is such good support for mat files outside of matlab. But I still don't understand why all the gene information is not part of the mat/SBML files (i.e. the information provided in Supplement S4, Supplemental Data File of Recon3D). There are links to Ensembl, MIM, UniProt, GO, WikiGene Id which are all lacking from the mat files and the SBML. Basically the species and reactions are annotated well, whereas all the gene information is lacking completely.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369353901, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYG2PP4GEqNpHaGmNDg62iId5IAZZks5tZahugaJpZM4SVc_O .
thanks @zakandrewking I completely understand, would be great to have this information in BiGG in the long run. No urgency here. I will work with the mat files and supplements for now.
Only real issue I see right now is that the old identifiersare missing from the SBML (i.e. the gene identifiers used in the mat files). The old gene identifiers are basically the gene ids. If they are not in the model it is impossible to map the SBML genes on the supplements.
great. we'll prioritize getting these old IDs in the SBML
On Wed, Feb 28, 2018 at 11:35 AM Matthias König [email protected] wrote:
thanks @zakandrewking https://github.com/zakandrewking I completely understand, would be great to have this information in BiGG in the long run. No urgency here. I will work with the mat files and supplements for now.
Only real issue I see right now is that the old identifiersare missing from the SBML (i.e. the gene identifiers used in the mat files). The old gene identifiers are basically the gene ids. If they are not in the model it is impossible to map the SBML genes on the supplements.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369356801, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYIRnrnRX7i9BlkV7UMUoS81Q4RSBks5tZaqVgaJpZM4SVc_O .