libmolgrid icon indicating copy to clipboard operation
libmolgrid copied to clipboard

Taking care of each region when creating a gninatype

Open drorhunvural opened this issue 1 year ago • 1 comments

Hi,

I'm converting a pdb file to a gninatype file. I have a process similar to the gninatype function in the link

def gninatype(file):
    # creates gninatype file for model input
    f=open(file.replace('.pdb','.types'),'w')
    f.write(file)
    f.close()
    atom_map=molgrid.FileMappedGninaTyper(f'{pathlib.Path(os.path.realpath(__file__)).resolve().parent}/gninamap')
    dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
    train_types=file.replace('.pdb','.types')
    dataloader.populate(train_types)
    example=dataloader.next()
    coords=example.coord_sets[0].coords.tonumpy()
    types=example.coord_sets[0].type_index.tonumpy()
    types=np.int_(types) 
    fout=open(file.replace('.pdb','.gninatypes'),'wb')
    for i in range(coords.shape[0]):
        fout.write(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
    fout.close()
    os.remove(train_types)
    return file.replace('.pdb','.gninatypes')

Are the features in gninamap (28 different features) applied for each x, y, z coordinates row (for each pocket)?

To ask my question more clearly, For example I have 1a4h.pdb file and I am generating 1a4h.gninatypes with above function called gninatype.

I have data file like below

18.5426 -3.5417 -4.3501 1a4h.gninatypes
16.4473 -2.0545 -9.2645 1a4h.gninatypes
11.5426 -5.5317 -7.3222 1a4h.gninatypes
17.5426 -6.5419 -1.6552 1a4h.gninatypes
...

The characteristics of each region are important to me. Are individual features of all individual regions (each row in the dataset) retained with a single gninatypes? Or do I need to set up a structure like the one below?

18.5426 -3.5417 -4.3501 1a4h_pocket1.gninatypes
16.4473 -2.0545 -9.2645 1a4h_pocket2.gninatypes
11.5426 -5.5317 -7.3222 1a4h_pocket3.gninatypes
17.5426 -6.5419 -1.6552 1a4h_pocket4.gninatypes

If you advise me to set up a second dataset structure, how do I do it?

drorhunvural avatar Mar 17 '23 17:03 drorhunvural

It's up to you want data you put in the gninatype file. Typically we store the entire structure. If ExampleProvider is being populated with a list of PDBs, it will provide all the coordinates that are in the PDB (after all, at no point hav eyou defined the binding site for it to prune around).

dkoes avatar Mar 17 '23 18:03 dkoes