micom icon indicating copy to clipboard operation
micom copied to clipboard

[MICOM 1.0 API] Proposed new format for fluxes

Open cdiener opened this issue 3 years ago • 0 comments

This is a proposal for a new format for fluxes slated for MICOM 1.0. Feel free to comment :smile:

Checklist

Current state

The current format for fluxes returned by MICOM is a table in wide format:

In [1]: from micom import Community

In [2]: from micom.data import test_taxonomy

In [3]: com = Community(test_taxonomy())
Building ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

In [7]: sol = com.cooperative_tradeoff(fluxes=True)

In [8]: sol.fluxes
Out[8]: 
reaction               ACALD    ACALDt      ACKr    ACONTa    ACONTb     ACt2r          ADK1  ...     SUCDi    SUCOAS      TALA          THD2      TKT1      TKT2       TPI
compartment                                                                                   ...                                                                          
Escherichia_coli_1  0.049190 -0.008897 -0.004224  5.999485  5.999485 -0.004224  3.388665e-11  ...  5.017641 -5.017641  1.489184  1.924736e-10  1.489184  1.173698  7.513137
Escherichia_coli_2 -0.079989 -0.115231  0.072559  6.001066  6.001066  0.072559  4.264225e-11  ...  5.033051 -5.033051  1.491048  1.924125e-10  1.491048  1.175562  7.495742
Escherichia_coli_3  0.102350  0.197394 -0.100513  6.004985  6.004985 -0.100513  3.662292e-11  ...  5.083935 -5.083935  1.506075  1.926208e-10  1.506075  1.190589  7.460396
Escherichia_coli_4 -0.071551 -0.073266  0.032177  6.023463  6.023463  0.032177  4.133342e-11  ...  5.122875 -5.122875  1.501628  1.926284e-10  1.501628  1.186143  7.440253
medium                   NaN       NaN       NaN       NaN       NaN       NaN           NaN  ...       NaN       NaN       NaN           NaN       NaN       NaN       NaN

[5 rows x 115 columns]

This has resulted in some issues:

  1. It is incompatible with cobra.Solution.fluxes which breaks a lot of the cobra functionality like for instance summary methods.
  2. It can be pretty sparse for very divergent models (many NA entries)
  3. It mixes medium and taxa fluxes
  4. It does not specify if export fluxes denote import or export which is one of the most common help requests we receive
  5. Basically all methods using flux results in MICOM will convert them to a long format

Proposed new API for fluxes

CommunitySolution.fluxes will retain the cobrapy format and will superseded by new accessors that all return fluxes in long format:

CommunitySolution.exchange_fluxes

Similar to the previous one but with the taxa annotated.

      reaction                     name               taxon          flux direction                       micom_id
0      EX_ac_m     ac_m medium exchange              medium  1.814984e-11    export                        EX_ac_m
1   EX_acald_m  acald_m medium exchange              medium  1.328645e-11    export                     EX_acald_m
2     EX_akg_m    akg_m medium exchange              medium  3.225128e-12    export                       EX_akg_m
3     EX_co2_m    co2_m medium exchange              medium  2.280983e+01    export                       EX_co2_m
4    EX_etoh_m   etoh_m medium exchange              medium  1.515389e-11    export                      EX_etoh_m
..         ...                      ...                 ...           ...       ...                           

CommunitySolution.internal_fluxes

    reaction                                               name               taxon          flux                    micom_id
0      ACALD           Acetaldehyde dehydrogenase (acetylating)  Escherichia_coli_1  1.312146e+00   ACALD__Escherichia_coli_1
1     ACALDt                  Acetaldehyde reversible transport  Escherichia_coli_1  3.236132e+00  ACALDt__Escherichia_coli_1
2       ACKr                                     Acetate kinase  Escherichia_coli_1 -1.304078e+00    ACKr__Escherichia_coli_1
3     ACONTa   Aconitase (half-reaction A, Citrate hydro-lyase)  Escherichia_coli_1  5.987675e+00  ACONTa__Escherichia_coli_1
4     ACONTb  Aconitase (half-reaction B, Isocitrate hydro-l...  Escherichia_coli_1  5.987675e+00  ACONTb__Escherichia_coli_1

This will consolidate GrowthResults and CommunitySolution and gives a more readable format. All those properties are generated on the fly when accessing the property.

Additionaly, we may also want to save the annotations in the solution but they may be large, so it might be better to have a property on the model class like Community.annotations.

Additional context

A similar format change is planned for Community.knockout_taxa. elasticities already uses a long format.

cdiener avatar Apr 26 '21 23:04 cdiener