cobratoolbox icon indicating copy to clipboard operation
cobratoolbox copied to clipboard

About mapExpressionToReactions.m function

Open sangmi9618 opened this issue 3 years ago • 3 comments

Please include a short description of problem here Hello. I am Sang Mi Lee from KAIST, South Korea. I have tried to use GIMME which is transcriptomics data integration method. Before using that I need to use src/dataIntegration/transcriptomics/preprocessing/mapExpressionToReactions.m function with my expression data. function [expressionRxns, parsedGPR, gene_used, signifRxns] = mapExpressionToReactions(model, expressionData, minSum) Using above function, I used input reference model (or generic model) with ENST-based GPR rules and regarding isoform level (i.e. ENST) TPM values. Then, I cannot obtain output from the function until more than 12 hours have passed. I am curious about this time-consuming situation is normal thing. If not, my input model and expression data are incompatible with the function?

Additionally, I tested with input reference model (or generic model) with ENSG-based GPR rules and regarding gene level FPKM values but its output came out in 20 minutes.

Thanks. Sang Mi

I hereby confirm that I have:

  • [x] Tried to solve the issue on my own
  • [x] Retried to run my code with the latest version of The COBRA Toolbox
  • [x] Checked that a similar issue has not already been opened

(Note: You may replace [ ] with [X] to check the box)

sangmi9618 avatar Sep 10 '20 11:09 sangmi9618

ENSG -> gene id ENST -> transcript id

Genome-scale metabolic models contain only gene identifiers (ENSG in your case) and not the other (or there are models I'm not aware of), simply because the latter is trancript. Also, how did you prepare expressionData struct? I assume even if you fed it to the function with ENST ids, you'll get an error (no output argument would be assigned in this case).

Ojami avatar Oct 09 '20 18:10 Ojami

Thank you for replying. I prepared expressionData struct by myself using following codes. I just loaded my RNA-seq data and stored to expressionData struct for each sample. Should I use function to prepare expressionData struct?

% RNA-seq data load (.tsv file, FPKM values)
rna_data = tdfread('IDH_negative_GBM+LGG_PCAWG_RNAseq_gene_FPKM.tsv','\t');
columns= fieldnames(rna_data);  % RNA-seq data column name load and save

for k=2:length(columns) % "2" is start column of sample ID
    expressionData=struct();
    expressionData.gene=cellstr(rna_data.(columns{1,1}));
    expressionData.value=rna_data.(columns{1,k});
    sample_id=columns{1,k};

Since I got confirmed for conversion of model gene IDs to a transcript IDs from the author of Human-GEM, I've tried to use Human-GEM containing ENST-based GPR rules. I look forward to hearing back from you about this issue soon. Thanks. Sang Mi

sangmi9618 avatar Oct 12 '20 04:10 sangmi9618

Your snippet seems fine, you don't need a specific function to prepare expressionData ; please consider:

  1. Try using readtable instead of tdfread, which makes the life a lot easier (unless your MATLAB is < 2013b, in that case even fileread or low-level I/O are more flexible).
  2. fieldnames returns a row not a column vector, so I'm wondering how this doesn't return error: rna_data.(columns{1,k})
  3. Also it's better if you could post a chunk of your data (if data is confidential, you can simulate a simple chunk just to show how your data is structured). But, again your script seems OK.
  4. If you do have such a conversion between ENST and ENSG, map them before you call mapExpressionToReactions, so you fed gene IDs (ENSG) to mapExpressionToReactions. Maybe this be of help: translateGrRules

Good luck! Oveis

Ojami avatar Oct 12 '20 07:10 Ojami