DDIMDL
DDIMDL copied to clipboard
event_db
Hello, Deng Yifan. I'm very interested in your article. I think it's a very good job. So right now I'm trying to replicate it.
I would like to ask you two questions:
First: the article says that you extracted 74528 pairs of DDI, but in the event.db There are only 37264 pairs of DDI. I wonder if your experiment only used 37264 pairs of DDI?
Second: In the drug table of event.db, the smile characteristics of drugs are some numbers. Did you use rdkit to convert a smile string into an 881 dimensional fingerprint? I am a fourth year undergraduate student. I have been looking for it on the Internet for a long time, but I still don't know how to convert it. If it's convenient, can you disclose this code?
Looking forward to your reply, thank you very much!
Hi, Shenggeng! For the first problem, this is because the same drug-drug pair are recorded twice in the data. For example, (sildenafil, Isosorbide mononitrate) and (Isosorbide mononitrate, sildenafil) for another. But they are the same in fact. So we delete half of them. For the second problem. Just try to learn the usage of RDKit package. For example, for the drug Isosorbide mononitrate. We can collect its SMILES [H][C@]12OCC@@H[C@@]1([H])OC[C@@H]2O from DrugBank. So here is the code:
from rdKit import Chem
from rdkit.Chem import AllChem
smile = '[H][C@]12OC[C@@H](O[N+]([O-])=O)[C@@]1([H])OC[C@@H]2O'
mol = Chem.MolFromSmiles(smile)
morgan_hashed = AllChem.GetMorganFingerprintAsBitVect(mol,2,nBits=881)
morgan_hashed.ToBitString()
It will be a bit vector of 881 length.
Hello, Yifan!
Thank you very much for your reply. I have understand the first question. Thank you very much!
But I still have questions about the second question.
For drug DB01296, his smiles is' N[C@H]1C(O)OC@HC@@H[C@@H]1O '. Through the code you provided, I did get a 881 dimensional vector. But in the event.db , its smiles features are 9|10|14|18|19|20|178|181|283|284|285|286|299|308|332|338|339|340|341|344|345|346|347|351|352|365|366|367|380|393|405|406|528|563|566|567|571|582|592|614|615|617|637|638|639|643|661|662|663|679|680|681|682|683|689|690|691|701|703. I wonder what these numbers mean?Does it mean that these positions are 1 in the 881 dimensional vector? But if this is the case, for the drug db01296, its ninth digit is 0, but there are 9 in these numbers. And its 16th digit is 1, but there is no 16 in these numbers.
Yes, you are right. The reason is because the fingerprint methods are different. For the fingerprint in the current dataset, it is obtained by a former student. He used the RDkit in JAVA. The code in my code used MorganFingerprint. It is the most common method. I have test the result. There is little difference between the current dataset's fingerprint and MorganFingerprint.
OK, I see. Thank you for your reply!Thank you very much!