biopython icon indicating copy to clipboard operation
biopython copied to clipboard

Any reason why Atom __init__ doesnt enforce coordiates as numpy array

Open pippo1990 opened this issue 2 years ago • 5 comments

It took me one day to realize that :

atom = PDB.Atom.Atom('O', [x,y,z] , 0, 100 , " ", str(grid_cnt)[:1], grid_cnt, element = "H")

resi1.add(atom)

works but then

atom1 -atom2 fails because I need [x,y,z] to be a numpy array instead of a list (using a structure parsed with PDB.PDBParser

and doing parsedatom - myatom works even if my coords are a list).

Any reason why Atom instance accept init values without checking their type ?

pippo1990 avatar Sep 11 '23 09:09 pippo1990

This may be a historic choice for speed on the assumption that very few people would create their own Atom directly.

It is documented in the Atom class docstring, although with fresh eyes "Numeric array" ought to say "NumPy array" to reflect the current name of that project.

We might also want to add type annotation here, building on the work in #4377.

peterjc avatar Sep 11 '23 13:09 peterjc

A quick search finds at least 17 uses of "Numeric array" in the PDB module which could be updated. Would you like to work on that?

peterjc avatar Sep 11 '23 13:09 peterjc

I am sorry Peter, I am not skilled enough on git and type annotation. Right now I am trying to reproduce KVFinder cavity search results (https://lbc-lnbio.github.io/KVFinder-web/ ) using pymol and Biopython. I created a grid made of atoms (.cif output from Biopython) to follow the KVFinder alghoritm, but looping through atoms to flag some of the points respect to the under investigation protein.cif file take ages given the fact that I am using Biopython to find neighbours of atom between the grid and the actual protein. Would you be so kind to point me towards some reading that explain how to translate or insert protein pdf/cif into numpy array and how to take advantage of them to calculate grid point distannces ? KVFinder papers are not that specific and the code is mainly C language I cant understand, I think they create different grids and do some calcultation between them, but not really clear to me.

pippo1990 avatar Sep 11 '23 16:09 pippo1990

You're asking something rather different but my guess would be look at Bio.PDB.internal_coords if you just want a numpy array of atom coordinates for a protein:

https://github.com/biopython/biopython/blob/biopython-181/Bio/PDB/internal_coords.py

I have not really used the PDB code myself in years - I'm not the best person to ask.

peterjc avatar Sep 11 '23 17:09 peterjc

found python version https://github.com/LBC-LNBio/pyKVFinder/blob/master/pyKVFinder/grid.py

pippo1990 avatar Sep 11 '23 18:09 pippo1990