libmolgrid
libmolgrid copied to clipboard
Taking care of each region when creating a gninatype
Hi,
I'm converting a pdb file to a gninatype file. I have a process similar to the gninatype function in the link
def gninatype(file):
# creates gninatype file for model input
f=open(file.replace('.pdb','.types'),'w')
f.write(file)
f.close()
atom_map=molgrid.FileMappedGninaTyper(f'{pathlib.Path(os.path.realpath(__file__)).resolve().parent}/gninamap')
dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
train_types=file.replace('.pdb','.types')
dataloader.populate(train_types)
example=dataloader.next()
coords=example.coord_sets[0].coords.tonumpy()
types=example.coord_sets[0].type_index.tonumpy()
types=np.int_(types)
fout=open(file.replace('.pdb','.gninatypes'),'wb')
for i in range(coords.shape[0]):
fout.write(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
fout.close()
os.remove(train_types)
return file.replace('.pdb','.gninatypes')
Are the features in gninamap (28 different features) applied for each x, y, z coordinates row (for each pocket)?
To ask my question more clearly, For example I have 1a4h.pdb file and I am generating 1a4h.gninatypes with above function called gninatype.
I have data file like below
18.5426 -3.5417 -4.3501 1a4h.gninatypes
16.4473 -2.0545 -9.2645 1a4h.gninatypes
11.5426 -5.5317 -7.3222 1a4h.gninatypes
17.5426 -6.5419 -1.6552 1a4h.gninatypes
...
The characteristics of each region are important to me. Are individual features of all individual regions (each row in the dataset) retained with a single gninatypes? Or do I need to set up a structure like the one below?
18.5426 -3.5417 -4.3501 1a4h_pocket1.gninatypes
16.4473 -2.0545 -9.2645 1a4h_pocket2.gninatypes
11.5426 -5.5317 -7.3222 1a4h_pocket3.gninatypes
17.5426 -6.5419 -1.6552 1a4h_pocket4.gninatypes
If you advise me to set up a second dataset structure, how do I do it?
It's up to you want data you put in the gninatype file. Typically we store the entire structure. If ExampleProvider is being populated with a list of PDBs, it will provide all the coordinates that are in the PDB (after all, at no point hav eyou defined the binding site for it to prune around).