RAVEN
RAVEN copied to clipboard
bug: predictLocalization error when using DeepLoc results
Description of the issue:
Hello, I am trying to incorporate location for my reactions. But I getting following error: Please find attached location file with my model. predict_location_juv.txt juvenile_model_ori_gpr.xlsx
Error using randsample (line 94) W must contain non-negative values with at least one positive value.
Error in predictLocalization (line 386) geneToMove=randsample(nGenes,1,true,max(GSS.scores(I,:),[],2)-GSS.scores(sub2ind(size(g2c),I,J))+0.1);
Reproducing this issue:
[outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization(newModel,GSS_juv,defaultCompartment,transportCost,maxTime)
System information
- Please report:
- RAVEN version (stabile release,
devel
branch?) - Operating system (Windows)
I hereby confirm that I have:
- [X] Followed the guidelines to install RAVEN.
- [ ] Checked that a similar issue does not already exist
- [X] If suitable, needed, asked first in the Gitter chat room about the issue
@SPD20 Could you please attach the model as SBML or MAT file, so we can replicate what you're trying to do?
Hi, Can I email the mat format model to you? Here its not letting me upload mat file saying that ..this kind of files are not supported.
Thanks
You could ZIP the file first, then you should be able to attach it. Otherwise, you can email me at [email protected].
Ok, I managed to locate the problem. It's two issues:
- Line 384 in the DeepLoc output file has no prediction for gene
L889_g19179_t1_0.4
, which brakes the function. We'll modify the code to catch these errors, but in the meanwhile you can remove line 384 from the DeepLoc output file. - The DeepLoc output file is not properly parsed by
parseScores
. This is now corrected in thefix/predictLocalization
branch, changes are visible here.
[An advice to speed up future response times, please provide all the necessary files and code. The model you sent was in COBRA format, but it should be in RAVEN format (with ravenCobraWrapper
, or directly loaded by importModel
); you seem to use a modified or old version of predictLocalization
as the error message you show mentions that randsample
is called in line 386, but this is 394 in the latest RAVEN version; and it would be helpful to show all the commands that you ran to end up with the error in predictLocalization
]
Hello,
Thanks for your reply and please accept my apology for not sending enough information.
This time I loaded my model file in xml format using ' model=importModel' function although it was loaded but with warnings
WARNING: The composition for the following metabolites could not be parsed: (5')ppPur-mRNA[Cytoplasm] 2-Methylthio-N6-L-threonylcarbamoyladenine-in-tRNA[Cytoplasm] 5'-(N7-Methyl-5'-triphosphoguanosine)-(2'-O-methyl-purine-ribonucleotide)-(2'-O-methyl-ribonucleotide)-(mRNA)[Cytoplasm] 5'-Phospho-(mRNA)[Cytoplasm] 5'-Triphospho-(mRNA)[Cytoplasm] .....and 17 more
I changed my parsescore code as per your suggestion..but I was still getting following error
Error using cell/unique (line 85) Cell array input must be a cell array of character vectors.
Error in parseScores_me (line 111) [~, J, K]=unique(GSS.genes);
my location file(generated using deeploc tool) was comma separated here.
Then I changed my input file to tab separated and ran it with new parsescore code and got following error
Subscripted assignment dimension mismatch.
Error in parseScores_me (line 106) GSS.scores(row,:)=str2double(tline(4:end));
Finally I ran this tab separated file with old parse score code(attached) and got some results for GSS.
GSS_juv =
struct with fields:
compartments: {1×11 cell}
scores: [1123×11 double]
genes: {1123×1 cell}
My predictlocation code is still giving the error ... plotResults = false maxTime = 15 transportCost = 0.5 defaultCompartment = 'Cytoplasm' [outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization_new(model,GSS_juv,defaultCompartment,transportCost,maxTime)
Error using randsample (line 94) W must contain non-negative values with at least one positive value.
Error in predictLocalization_new (line 402) toComp=randsample(nComps,1,true,GSS.scores(geneToMove,:)+0.2);
I will try to rectify problems of my model file...just in case they are behind this issue...
Thanks predict_location_juv2.txt
Right, it seems like there is some issues with different sources of DeepLoc (online vs. offline) giving slightly different output (there is difference in comma vs tab-separated, naming and formatting of columns). We're currently figuring out what the correct formats are, this should then fix the problems you encountered.
As a work around for you for now, from your DeepLoc file parseScores
should ignore the third column, which specifies whether the protein is membrane or soluble. To do this, change parseScores as:
line 93: GSS.compartments=GSS.compartments(4:end);
line 106: GSS.scores(row,:)=str2double(tline(4:end));
As was also done here: https://github.com/SysBioChalmers/RAVEN/commit/ff8629d55537e345a8212d2c34752c8e117970fe#diff-479365171760a0ecbfc3b9e4f4bbbc94.
Apologies for this, it all seems to stem from different output from DeepLoc.
Dear Eduard,
Thank you very much for this workaround. I will implement it and will let you know if it worked.
Thanks, Sonal
Get Outlook for Androidhttps://aka.ms/ghei36
From: Eduard Kerkhoven [email protected] Sent: Monday, August 17, 2020 11:46:19 AM To: SysBioChalmers/RAVEN [email protected] Cc: Dahale, Sonal (PG/R - Sch of Biosci & Med) [email protected]; Mention [email protected] Subject: Re: [SysBioChalmers/RAVEN] bug: PredictLocation using deeploc error (#310)
Right, it seems like there is some issues with different sources of DeepLoc (online vs. offline) giving slightly different output (there is difference in comma vs tab-separated, naming and formatting of columns). We're currently figuring out what the correct formats are, this should then fix the problems you encountered.
As a work around for you for now, from your DeepLoc file parseScores should ignore the third column, which specifies whether the protein is membrane or soluble. To do this, change parseScores as:
line 93: GSS.compartments=GSS.compartments(4:end); line 106: GSS.scores(row,:)=str2double(tline(4:end));
As was also done here: ff8629d#diff-479365171760a0ecbfc3b9e4f4bbbc94https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2FRAVEN%2Fcommit%2Fff8629d55537e345a8212d2c34752c8e117970fe%23diff-479365171760a0ecbfc3b9e4f4bbbc94&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=Zpg0gc66AW%2Fmuv8MA1eh9Hr37dFCujprWVgJnxUqx2I%3D&reserved=0.
Apologies for this, it all seems to stem from different output from DeepLoc.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2FRAVEN%2Fissues%2F310%23issuecomment-674807734&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=t%2BLjyCYrsAjMcgyX6cbZxg%2BibEuv3xR%2BRcmkPGiMGY0%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAODBIVGFAJJ5HA5OJ5E3OGTSBEC7XANCNFSM4PH75BPA&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=Af%2FmHC1RrTG6SiANiBBDNegC0xWxsHNlQkkFCPii8z0%3D&reserved=0.
Dear Eduard,
Your workaround has worked and my GSS is giving some output. Is there anything I need to change in PredictLocalization script as well? As I am getting the following error.
[outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization_edit(newModel,GSS_juv,defaultCompartment,transportCost,maxTime)
Error using vertcat Dimensions of matrices being concatenated are not consistent.
Error in predictLocalization_edit (line 603) outModel.compNames=[outModel.compNames;GSS.compartments(2:end)];
Can you please send me deeploc output used/preferred by Predictlocalization script, I will edit my results as per it?
Thanks,
@eiden309 You have been running predictLocalization, did you run into these issues? And how does the output look like that you are feeding into the function?
@edkerk yes I ran into the same issue previously, there is a minor bug in the function but it can be rectified by transposing GSS.compartments(2:end)
in predictLocalization (line 611) i.e. adding '
at the end of GSS.compartments(2:end)
:
current: outModel.compNames=[outModel.compNames;GSS.compartments(2:end)];
new: outModel.compNames=[outModel.compNames;GSS.compartments(2:end)'];
I think that the output from parseScores
can be used directly as the input for predictLocalization. @SPD20 FYI, if you still need the DeepLoc output file used for parseScores please do let me know. Hope this helps!