Incomprehension regarding data processing

Open PBordesInstadeep opened this issue 3 years ago • 0 comments

Hello,

I have a few questions regarding the way you process the data.

In your code you seem to use nrPDB-GO_2019.06.18_train.txt and nrPDB-GO_2020.06.18_annot.tsv to build the training data, but in your data you only have nrPDB-GO_2019.06.18_annot.tsv, is it normal ?
I analyzed your results file (DeepCNN-MERGED_molecular_function_results.pckl, DeepCNN-MERGED_cellular_component_results.pckl), and the size of the test set is the same depending on the ontologies. However, in your Supplementary table, you say that the size of the test set differ between MF, BP, CC. Why ?
In your Supplementary Table, the train/val/test set have different sizes depending on MF, BP, CC. Shouldn't they have the same size ?

Sep 23 '22 07:09 PBordesInstadeep