GECKO hard-coded data file & script locations

In measureAbundance, the location of the protemics data is hard-coded. This is inconvenient if one would for instance have proteomics data for many conditions and want to make models for each of these conditions: the user would have to replace the databases/prot_abundance.txt file. Probably better to have genes and abundance as parameters to the function.

Nov 07 '18 08:11 edkerk

@edkerk note that measureAbundance is not meant for constraining the model with proteomics, but to read Pax-DB data and estimate the fraction [g/g] of a group of enzymes respect to the total, which is needed to (later) set a constrain on the protein pool for enzymes which don't have proteomic measurement. The function you would want to use for the purpose you describe is constrainEnzymes, which accepts inputs of pIDs and data as you suggest.

That being said, I agree with the problem of hard-coded locations (e.g. of prot_abundance.txt), and we will change this at the toolbox level. For this the idea would be to add /geckomat* and /databases* paths as a requirement for using GECKO, that way we can avoid any relative path and use those functions/data from any other folder. Let us know if you have any thoughts on this proposal.

Nov 12 '18 14:11 BenjaSanchez

Thanks for the explanation, this could go directly in the documentation! :) What if no Pax-DB data is available for my organism of interest?

Requiring defining the location of the /geckomat* and /databases* paths as parameters sounds like a good solution.

Nov 12 '18 14:11 edkerk

What if no Pax-DB data is available for my organism of interest?

I guess it would have to be replaced with some proxy, e.g. the fraction of enzymes from the total, although that would underestimate metabolic enzymes... @IVANDOMENZAIN any ideas here?

Nov 12 '18 15:11 BenjaSanchez

@edkerk I have used relative proteomics datasets (when available) as a substitution for the prot_abundance.txt file, the f values that I have obtained for different organisms with this approach range from 0.3 to 0.48. As the f value is used for constraining the protein pool, I think that using a high value such as 0.48 or 0.5 also makes some sense because the protein pool becomes a limitation just for growth at very high rates (simulating batch conditions) but not for chemostat simulations with microbial models.

Nov 12 '18 15:11 IVANDOMENZAIN

Something to address here brought up by @sulheim:

parameter file can also provide the paths for the required database-files, cultivation data etc.

Nov 28 '19 09:11 BenjaSanchez

This will be completely revamped with GECKO3, the discussion here is obsolete.

Jan 10 '23 22:01 edkerk

GECKO GECKO copied to clipboard

hard-coded data file & script locations

GECKO
GECKO copied to clipboard