graphein icon indicating copy to clipboard operation
graphein copied to clipboard

Keep all hetatms for all-atom graphs

Open EvanKomp opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. Sort of. My use case includes loading all atom resolution including non protein atoms, and sending them eventually into PtG. I effectively cannot use Graphein as it stands in that I do not know the names of all hetatms in my dataset ahead of time.

Describe the solution you'd like In addition to list of string, accept bool True. Add logic to check that granularity is compatible if necessary.

Describe alternatives you've considered Parsing my whole dataset and sending in an absolutely massive list of hetatms

Thanks for your work.

EvanKomp avatar Nov 04 '24 22:11 EvanKomp

Hi @EvanKomp have you seen this API? https://colab.research.google.com/github/a-r-j/graphein/blob/master/notebooks/protein_tensors.ipynb#scrollTo=mpbEZJ4WmyQZ

It's a bit more suited for your use case. In particular, see protein_to_pyg and the keep_hets arg.

a-r-j avatar Nov 05 '24 13:11 a-r-j

@a-r-j It looks like that function also requires keep_hets as a list. Is there no way to keep all hetatms without having to specify the names?

EvanKomp avatar Nov 06 '24 17:11 EvanKomp

Apologies, my bad. I think the store_het arg will keep everything, while keep_hets is for specifying a subset.

a-r-j avatar Nov 06 '24 17:11 a-r-j