ROBIN It seems that the value of VSA_EState* was not generated well across the entire training data.

It seems that the value of VSA_EState* was not generated well across the entire training data.

Open jsko-arontier opened this issue 1 year ago • 1 comments

Thanks for publishing the good work. I ran the program to use ROBIN for our study and noticed something strange.

When I run the analysis using the provided file (Mordred_Test_Compounds_3D.csv), I get the results as stated in the paper, but when I run the analysis by generating descriptors directly from the sdf file, I get different results.

When I analyzed the generated files, I found that the VSA_EState* values were significantly different, as shown below, and in the provided files (Mordred_Test_Compounds_3D.csv, Mordred_ROBIN_RNA_Binder_3D.csv), the VSA_EState1~7 values are mostly 0. If you generate them yourself, these values will be present.

Here is the program I used

rdkit : 2022.9.5
mordred : 1.2.0
tensorflow : 2.3.1
scikit-learn : 1.0.2
numpy : 1.18.5
scipy : 1.9.3

$ cat Mordred_files cat Mordred_Test_Compounds_3D.csv | cut -d ',' -f 1,1561,1562,1563,1564,1565,1566,1567

name,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7
ADQ,0.0,0.0,0.0,0.0,0.0,0.0,0.0
HIV TAR compound 4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ribocil-A,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Tetracycline,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Imatinib,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ibrutinib,0.0,0.0,0.0,0.0,7.188619484542558,0.0,0.0
Lovastatin,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Nevirapine,0.0,0.0,0.0,0.0,0.0,0.0,0.0

$ cat Mordred_files cat Mordred_ROBIN_RNA_Binder_3D.csv | cut -d ',' -f 1,1561,1562,1563,1564,1565,1566,1567 | head -n 5

name,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7
0054-0090,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0096-0280,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0109-0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0109-0045,0.0,0.0,0.0,0.0,0.0,0.0,0.0

Jun 01 '23 07:06 jsko-arontier

ROBIN ROBIN copied to clipboard

It seems that the value of VSA_EState* was not generated well across the entire training data.

ROBIN
ROBIN copied to clipboard