kg-microbe icon indicating copy to clipboard operation
kg-microbe copied to clipboard

ingest gene knockout data from LBL microbial fitness experiments

Open realmarcin opened this issue 4 years ago • 1 comments

All of the data is here (84G total): http://genomics.lbl.gov/supplemental/bigfit/

The numerical relative growth data would have to be converted - growth vs no growth, via eg thresholding.

Just taking the first organism as an example: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/

On the organism page, under 'Genes' the 'Specific phenotypes' link gives a table of most significant phenotype per gene for this KO dataset: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/specific_phenotypes and this file can serve as the primary data source. These columns:

sysName desc name lrn t Group Condition_1 Concentration_1 Units_1

provide the following data:

gene name description internal name log ratio normalized t-statistic condition group condition name concentration unit

For reference under 'Genes' the 'Gene fitness' link gives a full table of relative fitness values: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_logratios_good.tab The y-axis labels are 'locusId' which are gene ids and the x-axis labels are condition (sample) ids including a text description.

There is additional data on each condition on the organism page under 'Tables' then 'Experiments' then 'Detailed metadata for experiments': http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/expsUsed

A basic ingest of this data would model as mutant alleles or a gene-condition relation indicating that this gene X is essential for growth in condition Y. As key supporting data the gene annotations should also be ingested: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_genes.tab with the caveat that these are 'free text' annotations so may require standardization.

Further ingests could include:

  • In addition, the expsUsed table could be treated as a Sample metadata table and run through the usual NLP process.
  • Significance values for each fitness value eg: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_t.tab

realmarcin avatar Dec 23 '20 01:12 realmarcin