GECKO icon indicating copy to clipboard operation
GECKO copied to clipboard

hard-coded data file & script locations

Open edkerk opened this issue 7 years ago • 5 comments

In measureAbundance, the location of the protemics data is hard-coded. This is inconvenient if one would for instance have proteomics data for many conditions and want to make models for each of these conditions: the user would have to replace the databases/prot_abundance.txt file. Probably better to have genes and abundance as parameters to the function.

edkerk avatar Nov 07 '18 08:11 edkerk

@edkerk note that measureAbundance is not meant for constraining the model with proteomics, but to read Pax-DB data and estimate the fraction [g/g] of a group of enzymes respect to the total, which is needed to (later) set a constrain on the protein pool for enzymes which don't have proteomic measurement. The function you would want to use for the purpose you describe is constrainEnzymes, which accepts inputs of pIDs and data as you suggest.

That being said, I agree with the problem of hard-coded locations (e.g. of prot_abundance.txt), and we will change this at the toolbox level. For this the idea would be to add /geckomat* and /databases* paths as a requirement for using GECKO, that way we can avoid any relative path and use those functions/data from any other folder. Let us know if you have any thoughts on this proposal.

BenjaSanchez avatar Nov 12 '18 14:11 BenjaSanchez

Thanks for the explanation, this could go directly in the documentation! :) What if no Pax-DB data is available for my organism of interest?

Requiring defining the location of the /geckomat* and /databases* paths as parameters sounds like a good solution.

edkerk avatar Nov 12 '18 14:11 edkerk

What if no Pax-DB data is available for my organism of interest?

I guess it would have to be replaced with some proxy, e.g. the fraction of enzymes from the total, although that would underestimate metabolic enzymes... @IVANDOMENZAIN any ideas here?

BenjaSanchez avatar Nov 12 '18 15:11 BenjaSanchez

@edkerk I have used relative proteomics datasets (when available) as a substitution for the prot_abundance.txt file, the f values that I have obtained for different organisms with this approach range from 0.3 to 0.48. As the f value is used for constraining the protein pool, I think that using a high value such as 0.48 or 0.5 also makes some sense because the protein pool becomes a limitation just for growth at very high rates (simulating batch conditions) but not for chemostat simulations with microbial models.

IVANDOMENZAIN avatar Nov 12 '18 15:11 IVANDOMENZAIN

Something to address here brought up by @sulheim:

parameter file can also provide the paths for the required database-files, cultivation data etc.

BenjaSanchez avatar Nov 28 '19 09:11 BenjaSanchez

This will be completely revamped with GECKO3, the discussion here is obsolete.

edkerk avatar Jan 10 '23 22:01 edkerk