cobratoolbox
cobratoolbox copied to clipboard
About mapExpressionToReactions.m function
Please include a short description of problem here
Hello. I am Sang Mi Lee from KAIST, South Korea.
I have tried to use GIMME which is transcriptomics data integration method. Before using that I need to use src/dataIntegration/transcriptomics/preprocessing/mapExpressionToReactions.m function with my expression data.
function [expressionRxns, parsedGPR, gene_used, signifRxns] = mapExpressionToReactions(model, expressionData, minSum)
Using above function, I used input reference model (or generic model) with ENST-based GPR rules and regarding isoform level (i.e. ENST) TPM values. Then, I cannot obtain output from the function until more than 12 hours have passed.
I am curious about this time-consuming situation is normal thing. If not, my input model and expression data are incompatible with the function?
Additionally, I tested with input reference model (or generic model) with ENSG-based GPR rules and regarding gene level FPKM values but its output came out in 20 minutes.
Thanks. Sang Mi
I hereby confirm that I have:
- [x] Tried to solve the issue on my own
- [x] Retried to run my code with the latest version of The COBRA Toolbox
- [x] Checked that a similar issue has not already been opened
(Note: You may replace [ ] with [X] to check the box)
ENSG -> gene id ENST -> transcript id
Genome-scale metabolic models contain only gene identifiers (ENSG in your case) and not the other (or there are models I'm not aware of), simply because the latter is trancript. Also, how did you prepare expressionData
struct? I assume even if you fed it to the function with ENST ids, you'll get an error (no output argument would be assigned in this case).
Thank you for replying. I prepared expressionData
struct by myself using following codes. I just loaded my RNA-seq data and stored to expressionData
struct for each sample. Should I use function to prepare expressionData
struct?
% RNA-seq data load (.tsv file, FPKM values)
rna_data = tdfread('IDH_negative_GBM+LGG_PCAWG_RNAseq_gene_FPKM.tsv','\t');
columns= fieldnames(rna_data); % RNA-seq data column name load and save
for k=2:length(columns) % "2" is start column of sample ID
expressionData=struct();
expressionData.gene=cellstr(rna_data.(columns{1,1}));
expressionData.value=rna_data.(columns{1,k});
sample_id=columns{1,k};
Since I got confirmed for conversion of model gene IDs to a transcript IDs from the author of Human-GEM, I've tried to use Human-GEM containing ENST-based GPR rules. I look forward to hearing back from you about this issue soon. Thanks. Sang Mi
Your snippet seems fine, you don't need a specific function to prepare expressionData
; please consider:
- Try using
readtable
instead oftdfread
, which makes the life a lot easier (unless your MATLAB is < 2013b, in that case evenfileread
or low-level I/O are more flexible). -
fieldnames
returns a row not a column vector, so I'm wondering how this doesn't return error:rna_data.(columns{1,k})
- Also it's better if you could post a chunk of your data (if data is confidential, you can simulate a simple chunk just to show how your data is structured). But, again your script seems OK.
- If you do have such a conversion between ENST and ENSG, map them before you call
mapExpressionToReactions
, so you fed gene IDs (ENSG) tomapExpressionToReactions
. Maybe this be of help: translateGrRules
Good luck! Oveis