RAVEN icon indicating copy to clipboard operation
RAVEN copied to clipboard

bug: predictLocalization error when using DeepLoc results

Open SPD20 opened this issue 3 years ago • 10 comments

Description of the issue:

Hello, I am trying to incorporate location for my reactions. But I getting following error: Please find attached location file with my model. predict_location_juv.txt juvenile_model_ori_gpr.xlsx

Error using randsample (line 94) W must contain non-negative values with at least one positive value.

Error in predictLocalization (line 386) geneToMove=randsample(nGenes,1,true,max(GSS.scores(I,:),[],2)-GSS.scores(sub2ind(size(g2c),I,J))+0.1);

Reproducing this issue:

[outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization(newModel,GSS_juv,defaultCompartment,transportCost,maxTime)

System information

  • Please report:
  1. RAVEN version (stabile release, devel branch?)
  2. Operating system (Windows)

I hereby confirm that I have:

SPD20 avatar Jul 26 '20 14:07 SPD20

@SPD20 Could you please attach the model as SBML or MAT file, so we can replicate what you're trying to do?

edkerk avatar Jul 28 '20 21:07 edkerk

Hi, Can I email the mat format model to you? Here its not letting me upload mat file saying that ..this kind of files are not supported.

Thanks

SPD20 avatar Jul 28 '20 21:07 SPD20

You could ZIP the file first, then you should be able to attach it. Otherwise, you can email me at [email protected].

edkerk avatar Aug 02 '20 21:08 edkerk

Ok, I managed to locate the problem. It's two issues:

  1. Line 384 in the DeepLoc output file has no prediction for gene L889_g19179_t1_0.4, which brakes the function. We'll modify the code to catch these errors, but in the meanwhile you can remove line 384 from the DeepLoc output file.
  2. The DeepLoc output file is not properly parsed by parseScores. This is now corrected in the fix/predictLocalization branch, changes are visible here.

[An advice to speed up future response times, please provide all the necessary files and code. The model you sent was in COBRA format, but it should be in RAVEN format (with ravenCobraWrapper, or directly loaded by importModel); you seem to use a modified or old version of predictLocalization as the error message you show mentions that randsample is called in line 386, but this is 394 in the latest RAVEN version; and it would be helpful to show all the commands that you ran to end up with the error in predictLocalization]

edkerk avatar Aug 13 '20 20:08 edkerk

Hello,

Thanks for your reply and please accept my apology for not sending enough information.

This time I loaded my model file in xml format using ' model=importModel' function although it was loaded but with warnings

WARNING: The composition for the following metabolites could not be parsed: (5')ppPur-mRNA[Cytoplasm] 2-Methylthio-N6-L-threonylcarbamoyladenine-in-tRNA[Cytoplasm] 5'-(N7-Methyl-5'-triphosphoguanosine)-(2'-O-methyl-purine-ribonucleotide)-(2'-O-methyl-ribonucleotide)-(mRNA)[Cytoplasm] 5'-Phospho-(mRNA)[Cytoplasm] 5'-Triphospho-(mRNA)[Cytoplasm] .....and 17 more

I changed my parsescore code as per your suggestion..but I was still getting following error

Error using cell/unique (line 85) Cell array input must be a cell array of character vectors.

Error in parseScores_me (line 111) [~, J, K]=unique(GSS.genes);

my location file(generated using deeploc tool) was comma separated here.

Then I changed my input file to tab separated and ran it with new parsescore code and got following error

Subscripted assignment dimension mismatch.

Error in parseScores_me (line 106) GSS.scores(row,:)=str2double(tline(4:end));

Finally I ran this tab separated file with old parse score code(attached) and got some results for GSS.

GSS_juv =

struct with fields:

compartments: {1×11 cell}
      scores: [1123×11 double]
       genes: {1123×1 cell}

My predictlocation code is still giving the error ... plotResults = false maxTime = 15 transportCost = 0.5 defaultCompartment = 'Cytoplasm' [outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization_new(model,GSS_juv,defaultCompartment,transportCost,maxTime)

Error using randsample (line 94) W must contain non-negative values with at least one positive value.

Error in predictLocalization_new (line 402) toComp=randsample(nComps,1,true,GSS.scores(geneToMove,:)+0.2);

I will try to rectify problems of my model file...just in case they are behind this issue...

Thanks predict_location_juv2.txt

parseScores_old.txt

SPD20 avatar Aug 14 '20 19:08 SPD20

Right, it seems like there is some issues with different sources of DeepLoc (online vs. offline) giving slightly different output (there is difference in comma vs tab-separated, naming and formatting of columns). We're currently figuring out what the correct formats are, this should then fix the problems you encountered.

As a work around for you for now, from your DeepLoc file parseScores should ignore the third column, which specifies whether the protein is membrane or soluble. To do this, change parseScores as:

line 93: GSS.compartments=GSS.compartments(4:end); line 106: GSS.scores(row,:)=str2double(tline(4:end));

As was also done here: https://github.com/SysBioChalmers/RAVEN/commit/ff8629d55537e345a8212d2c34752c8e117970fe#diff-479365171760a0ecbfc3b9e4f4bbbc94.

Apologies for this, it all seems to stem from different output from DeepLoc.

edkerk avatar Aug 17 '20 10:08 edkerk

Dear Eduard,

Thank you very much for this workaround. I will implement it and will let you know if it worked.

Thanks, Sonal

Get Outlook for Androidhttps://aka.ms/ghei36


From: Eduard Kerkhoven [email protected] Sent: Monday, August 17, 2020 11:46:19 AM To: SysBioChalmers/RAVEN [email protected] Cc: Dahale, Sonal (PG/R - Sch of Biosci & Med) [email protected]; Mention [email protected] Subject: Re: [SysBioChalmers/RAVEN] bug: PredictLocation using deeploc error (#310)

Right, it seems like there is some issues with different sources of DeepLoc (online vs. offline) giving slightly different output (there is difference in comma vs tab-separated, naming and formatting of columns). We're currently figuring out what the correct formats are, this should then fix the problems you encountered.

As a work around for you for now, from your DeepLoc file parseScores should ignore the third column, which specifies whether the protein is membrane or soluble. To do this, change parseScores as:

line 93: GSS.compartments=GSS.compartments(4:end); line 106: GSS.scores(row,:)=str2double(tline(4:end));

As was also done here: ff8629d#diff-479365171760a0ecbfc3b9e4f4bbbc94https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2FRAVEN%2Fcommit%2Fff8629d55537e345a8212d2c34752c8e117970fe%23diff-479365171760a0ecbfc3b9e4f4bbbc94&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=Zpg0gc66AW%2Fmuv8MA1eh9Hr37dFCujprWVgJnxUqx2I%3D&reserved=0.

Apologies for this, it all seems to stem from different output from DeepLoc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2FRAVEN%2Fissues%2F310%23issuecomment-674807734&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=t%2BLjyCYrsAjMcgyX6cbZxg%2BibEuv3xR%2BRcmkPGiMGY0%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAODBIVGFAJJ5HA5OJ5E3OGTSBEC7XANCNFSM4PH75BPA&data=02%7C01%7Csonal.dahale%40surrey.ac.uk%7Cd4cc6d3c74ff435f5d1c08d8429ac729%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637332579817431953&sdata=Af%2FmHC1RrTG6SiANiBBDNegC0xWxsHNlQkkFCPii8z0%3D&reserved=0.

SPD20 avatar Aug 17 '20 10:08 SPD20

Dear Eduard,

Your workaround has worked and my GSS is giving some output. Is there anything I need to change in PredictLocalization script as well? As I am getting the following error.

[outModel, geneLocalization, transportStruct, score, removedRxns] = predictLocalization_edit(newModel,GSS_juv,defaultCompartment,transportCost,maxTime)

Error using vertcat Dimensions of matrices being concatenated are not consistent.

Error in predictLocalization_edit (line 603) outModel.compNames=[outModel.compNames;GSS.compartments(2:end)];

Can you please send me deeploc output used/preferred by Predictlocalization script, I will edit my results as per it?

Thanks,

SPD20 avatar Nov 05 '20 09:11 SPD20

@eiden309 You have been running predictLocalization, did you run into these issues? And how does the output look like that you are feeding into the function?

edkerk avatar Nov 12 '20 23:11 edkerk

@edkerk yes I ran into the same issue previously, there is a minor bug in the function but it can be rectified by transposing GSS.compartments(2:end) in predictLocalization (line 611) i.e. adding ' at the end of GSS.compartments(2:end):

current: outModel.compNames=[outModel.compNames;GSS.compartments(2:end)]; new: outModel.compNames=[outModel.compNames;GSS.compartments(2:end)'];

I think that the output from parseScores can be used directly as the input for predictLocalization. @SPD20 FYI, if you still need the DeepLoc output file used for parseScores please do let me know. Hope this helps!

eiden309 avatar Dec 13 '20 07:12 eiden309