GECKO icon indicating copy to clipboard operation
GECKO copied to clipboard

GAM estimation after modified biomass

Open edkerk opened this issue 5 years ago • 1 comments

tl;dr: GAM estimation in scaleBioMass is wrong, proposed new solution in last paragraph.

When the biomass composition is changed, the GAM should be reestimated. This is done by scaleBioMass, but the results seem counterintuitive. Protein is the most expensive macromolecule, with polymerization cost at 37.7, in contrast to 26.0 for DNA and RNA, and 12.8 for carbohydrates (all taken from getModelParameters). When protein content is changed, the lipid and carbohydrate components are scaled accordingly. Considering that lipid has no associated polymerization cost and carbohydrate is 1/3 of the cost for protein, increase in protein content should result in (significantly) increased GAM.

Run this code on ecYeast-GEM. (click to expand)
model=load('../limit_proteins')
[tmp1,GAM1] = scaleBioMass(ecModel_batch,0.2,[],true);
[tmp2,GAM2] = scaleBioMass(ecModel_batch,0.6,[],true);

bmIdx = getIndexes(ecModel_batch,'r_4041','rxns');
ATPidx = getIndexes(ecModel_batch,'ATP[c]','metcomps');

GAM1_tot=-tmp1.S(ATPidx,bmIdx);
GAM1_pol=GAM1_tot - GAM1;
GAM2_tot=-tmp2.S(ATPidx,bmIdx);
GAM2_pol=GAM2_tot - GAM2;

disp(['Protein 0.2 g/g -> total GAM: ' num2str(GAM1_tot) '; fitted GAM: ' num2str(GAM1) '; polymerization GAM: ' num2str(GAM1_pol)])
disp(['Protein 0.6 g/g -> total GAM: ' num2str(GAM2_tot) '; fitted GAM: ' num2str(GAM2) '; polymerization GAM: ' num2str(GAM2_pol)])

The output from this code is:

Protein 0.2 g/g -> total GAM: 60.6956; fitted GAM: 45.1; polymerization GAM: 15.5956
Protein 0.6 g/g -> total GAM: 56.6804; fitted GAM: 30.1; polymerization GAM: 26.5804

So when protein content is higher, the GAM is actually lower. I show multiple numbers for GAM: the total GAM as specified in the biomass pseudoreaction after running scaleBioMass; the GAM that is fitted as part of the scaleBioMass function; and the polymerization GAM as calculated from the numbers in getModelParameters. It is the fitted number that I don't understand why it is calculated as it is. What is odd is that the GAM that is given as output from scaleBioMass is not the GAM used in the model: GAM1 ≠ GAM1_tot in the code above.

Currently, scaleBioMass modifies the biomass composition and then runs fitGAM, which uses a built-in chemostat to fit the GAM. However, after fitting this GAM there is a calculation of GAM based on polymerization of macromolecule-precursors, and the fitted GAM and polymerization GAM are subsequentially summed?

One could argue that the chemostatData.tsv should perhaps be adapted to the conditions that I want to reconstruct models for. However (1) the data I have might not be suitable for fitting GAM [because this is based on the slope of the relation between growth rate and exchanges rates. If my data involves different levels of nitrogen at the same dilution rate, then the glucose, O2 and CO2 rates all change will growth rate remains the same. Meanwhile different levels of nitrogen limitation might give drastic changes in protein content!]; (2) regardless, any other dataset with any positive relation between growth rate and exchange rates will have the same problem.

To go back to the GAM reported by scaleBioMass, this is the value that is used in the plot that fitGAM automatically generates. Making this plot with the GAM that is actually incorporated in the biomass reaction by scaleBioMass obviously doesn't give a great fit.

I know that GAM calculation from polymerization is an underestimation, so just going for the polymerization cost only is not promising. I would like to propose an alternative way to calculate GAM as part of the scaleBioMass function. I raise this as in Issue instead of directly making a PR because I do not understand the reasoning for the current calculation of GAM and I might have misunderstood something. The alternative approach is:

  1. Assume that the GAM in the 'input' model is correct. One can choose to first run fitGAM to make sure that the GAM fits some provided chemostat data, but this is optional. In the current generate_ProtModels this would mean running this code before starting to construct condition specific functions. Remove fitGAM from scaleBiomass, but instead do the following calculation:
  2. Take the GAM from the biomass equation as provided in the model.
  3. Calculate the polymerization cost based on existing biomass composition.
  4. Subtract (2) from (1) to obtain the part of GAM that cannot be contributed to the simple polymerization calculation.
  5. Calculate the polymerization cost based on the updated biomass composition.
  6. Sum (3) and (4) to obtain the new total GAM.

edkerk avatar Jul 08 '20 18:07 edkerk

So much work in writing this Issue, and now I figured out that it's not as complicated as I initially thought.... Hopefully these contemplations might be relevant for some other poor soul who might be looking into this at a later stage.

I got confused with what GAM is, it is the growth associated maintenance, so without the polymerization of macromolecule-precursors. So that the GAM output reported by scaleBioMass not being the same as the coefficient of ATP in the biomass pseudoreaction is correct, the GAM output is the non-polymerization part, the ATP coefficient is the combination of non-polymerization and biomass-precursor polymerization costs. So the scaleBioMass and fitGAM functions do not require any code change.

The problem with growth related energy demand decreasing when protein content increases still remains. However, the way to deal with this is simpler:

  1. In generate_protModels, determine GAM on the ecModel before any condition-specific changes on e.g. biomass composition, and use this GAM when scaleBioMass is run.

    Code changeIn generate_protModels, line 42, change: GAM = [] to GAM = scaleBioMass(ecModel_batch,sumProtein(tempModel),[],true); (and navigate in & out of the correct directories); and in line 98, change [~,GAM] = scaleBioMass(tempModel,Ptot(i),[],true); to tempModel = scaleBioMass(tempModel,Ptot(i),GAM,true);

  2. Done.

The problem was that if the biomass composition in the ecModel are already changed when fitting GAM, then the difference in cost related to polymerization is largely cancelled out when fitting the non-polymerization GAM. By not yet changing the biomass composition, the (non-polymerization) GAM stays the same at different biomass compositions, while the change in biomass composition just affects the polymerization part of the growth related energy demand. While stritcly speaking, GAM probably does change a bit at different biomass composition, this is probably a modest change?

GAM can alternatively be hard-coded in getModelParameters.

edkerk avatar Jul 08 '20 21:07 edkerk

Inactive for > 2 years.

edkerk avatar Mar 05 '23 14:03 edkerk