DeepFRI icon indicating copy to clipboard operation
DeepFRI copied to clipboard

Huge differences between DeepFRI server and local predictions

Open OtimusOne opened this issue 3 years ago • 3 comments

Hello, I am using a local instance of DeepFRI with the Newest Models(CPU) and I've noticed that the predictions I get are totally different when compared with the server.

For example for the 1S3P sequence the server returns a total of 12 predictions with a score over 0.50 while my local instance returns none. I'm running in a Windows Subsystem for Linux environment without GPU but I don't think that should be an issue for the CPU models?

$ python ./predict.py --seq 'SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKDGFIDEDELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES' -ont mf --verbose
2022-03-21 22:37:37.026507: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-03-21 22:37:37.026573: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-03-21 22:37:38.620674: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-03-21 22:37:38.620760: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-21 22:37:38.620818: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host: /proc/driver/nvidia/version does not exist
2022-03-21 22:37:38.621137: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-21 22:37:38.635260: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 4001000000 Hz
2022-03-21 22:37:38.637696: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ffffa70a150 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-21 22:37:38.637795: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
### Computing predictions on a single protein...
Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.13736 calcium ion binding
query_prot GO:0016788 0.11856 hydrolase activity, acting on ester bonds
### Saving predictions to *.json file...

OtimusOne avatar Mar 21 '22 20:03 OtimusOne

Same experience here. Moreover, if I supply the same input PDB structure to both the DeepFRI server and my local install of DeepFRI (using supplied trained models) I get back dramatically different predictions.

This kind of behavior raises concerns over DeepFRI's robustness and reproducibility. Now would be a great time for the authors' to chime in and offer some perspective and reassurance...

DaRinker avatar Sep 19 '22 19:09 DaRinker

I believe the DeepFRI web server is still using the original set of weights. The "newer" weights, linked on the README.md, are trained on a more recent version of the SWIFTS database (and usable on both CPU and GPU). The number of functions able to be predicted should be increased. There might be some variation in lower-confidence predictions.

dougrenfrew avatar Sep 19 '22 21:09 dougrenfrew

Thanks for weighing in. While that sounds plausible, should the differences be drastic or subtle?

For example, the disparity I'm noticing looks like this:

Website

Structure-Based Molecular Function - GO Term Predictions No predictions above threshold.

Local install:

GO:0004518 0.98356 nuclease activity GO:0016788 0.93346 hydrolase activity, acting on ester bonds

I also have examples that go in exactly the opposite direction (i.e. GO terms >0.9 on webserver and not appearing at all in local instance.

Do you have any suggestions for a (well annotated) structural dataset I could benchmark DeepFRI on?

DaRinker avatar Sep 19 '22 21:09 DaRinker