DeepPurpose
DeepPurpose copied to clipboard
Query Regarding MPNN Drug Encoder
Hello,
Before I ask my query, I would like to congratulate everybody involved in building this great framework. I was trying to train an MPNN-CNN model on the latest BindingDB data and while exploring the existing MPNN Drug encoder implementation, I ran into a couple of doubts/queries. I was hoping to get some clarification from you guys.
From what I understand, the agraph
property associated with a drug is basically an adjacency list where the row_index
represents an atom and each column value is the in_bond
number/index. I deduced this from the following code snippet in the smiles2mpnnfeature
method.
for a in range(n_atoms):
for i,b in enumerate(in_bonds[a]):
agraph[a,i] = b
Then, during the MPNN model training, I can see that the agraphs
associated with all drugs in a given batch are combined into one combined agraph_1st
variable. In this consolidation step, you used two variables N_a
and N_b
to maintain the running sum of atoms and bonds from each drug. Also, to re-index this combined adjacency list, I can see that you are adding the N_a
value at each step (code snippet below).
for i in range(N_atoms_bond.shape[0]):
....
agraph_lst.append(agraph[i,:atom_num,:] + N_a)
However, since the values in these adjacency lists are bond numbers or indices, shouldn't we be adding the N_b
variable value instead? Could you please clarify this doubt of mine?
Thanks in advance!
agraph_lst
save the index of atom. N_a
is the number of total atoms before the current molecule. so we need to add the value. it is not related to the bond number.
Hello,
Thanks for your response. However, the agraph_1st
variable is populated using the agraph
value, which seems to contain the bond number right (based on the first code snippet)??
not bond number. but atom number. thanks