deepgoplus icon indicating copy to clipboard operation
deepgoplus copied to clipboard

Evaluate the predictions.pkl provided on CAFA3 with CAFA_assessment_tool

Open Worldseer opened this issue 2 years ago • 14 comments

I left lines 152-162 of the code in evaluate_deepgoplus.py uncommented and added the following export file command. The DeepGOPlus_1_all.txt file was generated using the provided predictions.pkl file from CAFA3. Afterwards it was evaluated using the official evaluation tool and the results obtained were not the same as in the paper. I assume I can't export the txt file directly, so where do I need to modify to and get the same results as in the paper?

  print("exporting predictions to CAFA submission format")
  txt_out=[]
      
  txt_out.append('AUTHOR None\n')
  txt_out.append('MODEL 1\n')
  txt_out.append('KEYWORDS natural language processing.\n')
  for i, row in enumerate(test_df.itertuples()):
      prot_id = row.proteins
      for go_id, score in deep_preds[i].items():
          #print(f'{prot_id}\t{go_id}\t{score:.2f}')
          score_str = "{0:.2f}".format(score)
          if(score_str!="0.00"):
              txt_out.append(str(prot_id)+"\t"+str(go_id)+"\t"+score_str+"\n")
  txt_out.append('END')
  with open(filename, 'w') as f:
      f.writelines(txt_out)

Worldseer avatar Nov 16 '22 02:11 Worldseer

Hi, You need to use this tool to get the same results as in the paper for CAFA3 evaluation https://github.com/ashleyzhou972/CAFA_assessment_tool

If I remember correctly this file should have the txt files deepgoplus-cafa.tar.gz

coolmaksat avatar Nov 16 '22 07:11 coolmaksat

Thank you for your reply. I have tested the txt file you provided and it is consistent with the results in the article. I would also like to evaluate my model, so I would like to ask you how you generated this txt file, just the exact details of generating the txt file from predict.pkl. It would be great if you have the corresponding coding to provide me. You can send it to me via email at [email protected]. Once again, my thanks!

Worldseer avatar Nov 16 '22 08:11 Worldseer

I think I have used evaluate_cafa3.py script to generate them. Uncomment lines 124-132

coolmaksat avatar Nov 16 '22 08:11 coolmaksat

Thank you for taking the time to reply to me, I will try it again. The previous try didn't turn out right, as I said the first time I asked you

Worldseer avatar Nov 16 '22 09:11 Worldseer

Hello, Dr. Kulmanov, I am back, I have tested several times. The above is tested with the txt file generated by data-cafa3/predictions.pkl, the bottom is tested with the txt provided directly by you. %Results of the txt file generated using the predictions.pkl file you provided

%Results of the txt file generated using the predictions.pkl file you provided 
Ontology	Type	Mode	 | Fmax	Threshold	Coverage
bpo	NK	partial	 | 0.3864840946855839	0.18	1.0
bpo	NK	full	 | 0.3864840946855839	0.18	1.0
bpo	LK	partial	 | 0.4072675565869404	0.17	1.0
bpo	LK	full	 | 0.4072675565869404	0.17	1.0
cco	NK	partial	 | 0.6106824202178297	0.25	1.0
cco	NK	full	 | 0.6106824202178297	0.25	1.0
cco	LK	partial	 | 0.6000260772486744	0.22	1.0
cco	LK	full	 | 0.6000260772486744	0.22	1.0
mfo	NK	partial	 | 0.5549217805948832	0.12	1.0
mfo	NK	full	 | 0.5549217805948832	0.12	1.0
mfo	LK	partial	 | 0.5376631411272229	0.11	1.0
mfo	LK	full	 | 0.5376631411272229	0.11	1.0
%Results on CAFA_assessment_tool using the txt file you provided
Species:all
Ontology	Type	Mode	 | Fmax	Threshold	Coverage
bpo	NK	partial	 | 0.3899217629352804	0.18	1.0
bpo	NK	full	 | 0.3899217629352804	0.18	1.0
bpo	LK	partial	 | 0.4098261625978569	0.14	1.0
bpo	LK	full	 | 0.4098261625978569	0.14	1.0
cco	NK	partial	 | 0.6126849280111355	0.25	1.0
cco	NK	full	 | 0.6126849280111355	0.25	1.0
cco	LK	partial	 | 0.5963076619060491	0.23	1.0
cco	LK	full	 | 0.5963076619060491	0.23	1.0
mfo	NK	partial	 | 0.5561658907106347	0.12	1.0
mfo	NK	full	 | 0.5561658907106347	0.12	1.0
mfo	LK	partial	 | 0.5217672872688168	0.11	1.0
mfo	LK	full	 | 0.5217672872688168	0.11	1.0

The thresholds on NK are the same for both, but the values of Fmax are somewhat different. Why is it so different? This question has been bothering me for weeks, or maybe you can illuminate my mind.

Worldseer avatar Nov 16 '22 12:11 Worldseer

Hi, I need to rerun everything and check before I can give an answer for this. One possible reason might be the ontology version, or if you re-train the model you might get some other results. We've been updating the model several times, maybe we changed something there, I'm not really sure. But, I think the results are not significantly different and are almost the same as in the paper. Checkout our latest model called DeepGOZero

coolmaksat avatar Nov 16 '22 13:11 coolmaksat

Thank you very much for your reply, the version of the ontology I used is go_cafa3.obo. I just took DeepGOPlus's predictions.pkl on cafa and verified it without retraining the model.I look forward to hearing from you again after you check. I will read your latest paper carefully and I believe it will give me a lot of ideas. Thanks again.

Worldseer avatar Nov 16 '22 14:11 Worldseer

I think I have used evaluate_cafa3.py script to generate them. Uncomment lines 124-132

when I used evaluate_cafa3.py, one of input files is predictions.pkl, how to generate this file? I want to evaluate our annotation results

simon19891216 avatar Apr 03 '23 08:04 simon19891216

furthermore, since the related functions have been developed, whether the evaluate_cafa3 can be added into your website?

simon19891216 avatar Apr 03 '23 08:04 simon19891216

hello simon,You can refer to https://github.com/bio-ontology-research-group/deepgoplus/blob/master/deepgoplus.py to generate predictions.pkl. line 186

Worldseer avatar Apr 10 '23 03:04 Worldseer

Thank you. I will try it.

At 2023-04-10 11:34:45, "pencorn" @.***> wrote:

hello simon,You can refer to https://github.com/bio-ontology-research-group/deepgoplus/blob/master/deepgoplus.py to generate predictions.pkl. line 186

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

simon19891216 avatar Apr 10 '23 07:04 simon19891216

Hello, Dr. Kulmanov, I am back, I have tested several times. The above is tested with the txt file generated by data-cafa3/predictions.pkl, the bottom is tested with the txt provided directly by you. %Results of the txt file generated using the predictions.pkl file you provided

%Results of the txt file generated using the predictions.pkl file you provided 
Ontology	Type	Mode	 | Fmax	Threshold	Coverage
bpo	NK	partial	 | 0.3864840946855839	0.18	1.0
bpo	NK	full	 | 0.3864840946855839	0.18	1.0
bpo	LK	partial	 | 0.4072675565869404	0.17	1.0
bpo	LK	full	 | 0.4072675565869404	0.17	1.0
cco	NK	partial	 | 0.6106824202178297	0.25	1.0
cco	NK	full	 | 0.6106824202178297	0.25	1.0
cco	LK	partial	 | 0.6000260772486744	0.22	1.0
cco	LK	full	 | 0.6000260772486744	0.22	1.0
mfo	NK	partial	 | 0.5549217805948832	0.12	1.0
mfo	NK	full	 | 0.5549217805948832	0.12	1.0
mfo	LK	partial	 | 0.5376631411272229	0.11	1.0
mfo	LK	full	 | 0.5376631411272229	0.11	1.0
%Results on CAFA_assessment_tool using the txt file you provided
Species:all
Ontology	Type	Mode	 | Fmax	Threshold	Coverage
bpo	NK	partial	 | 0.3899217629352804	0.18	1.0
bpo	NK	full	 | 0.3899217629352804	0.18	1.0
bpo	LK	partial	 | 0.4098261625978569	0.14	1.0
bpo	LK	full	 | 0.4098261625978569	0.14	1.0
cco	NK	partial	 | 0.6126849280111355	0.25	1.0
cco	NK	full	 | 0.6126849280111355	0.25	1.0
cco	LK	partial	 | 0.5963076619060491	0.23	1.0
cco	LK	full	 | 0.5963076619060491	0.23	1.0
mfo	NK	partial	 | 0.5561658907106347	0.12	1.0
mfo	NK	full	 | 0.5561658907106347	0.12	1.0
mfo	LK	partial	 | 0.5217672872688168	0.11	1.0
mfo	LK	full	 | 0.5217672872688168	0.11	1.0

The thresholds on NK are the same for both, but the values of Fmax are somewhat different. Why is it so different? This question has been bothering me for weeks, or maybe you can illuminate my mind.

How can I get the txt files from the prediction.pkl file ,I uncommented the 124-132 ,but there's no file generated ,I changed lines 124-132 to print("exporting predictions to CAFA submission format") txt_out=[]

txt_out.append('AUTHOR None\n') txt_out.append('MODEL 1\n') txt_out.append('KEYWORDS natural language processing.\n') for i, row in enumerate(test_df.itertuples()): prot_id = row.proteins for go_id, score in deep_preds[i].items(): #print(f'{prot_id}\t{go_id}\t{score:.2f}') score_str = "{0:.2f}".format(score) if(score_str!="0.00"): txt_out.append(str(prot_id)+"\t"+str(go_id)+"\t"+score_str+"\n") txt_out.append('END') with open(filename, 'w') as f: f.writelines(txt_out). But the files generated each time are the same , how can I get the four files(mf bp cc and the all.txt files).Can you give me some guidance

song2000012138 avatar Dec 04 '23 13:12 song2000012138

Hello, evaluate_cafa3.py script has a parameter for sub ontology selection '--ont', you need to run it three times with 'mf', 'bp' and 'cc' parameters. To get 'all' file just concatenate the three files.

coolmaksat avatar Dec 05 '23 05:12 coolmaksat

Hello, evaluate_cafa3.py script has a parameter for sub ontology selection '--ont', you need to run it three times with 'mf', 'bp' and 'cc' parameters. To get 'all' file just concatenate the three files.

Ok,I'll try it,Thank you so much.

song2000012138 avatar Dec 05 '23 07:12 song2000012138