OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

New biomol fields

Open d-beltran opened this issue 3 years ago • 3 comments

This is the third option on how to include biomolecular data within Optimade. This has been discussed in issue 389.

It introduces two new main fields: biomol_chains and biomol_residues. These fields describe how atoms are grouped in "chains" and "residues", two classifiers widely used in the biomolecular field.

In addition two more fields are suggested: biomol_sequences and biomol_sequence_types. These fields describe sequences of residues and they are useful for queries.

New fields are placed in the appendix, as @JPBergsma did in previous options PR395 and PR396

d-beltran avatar Feb 23 '22 12:02 d-beltran

I just came across another issue. I am trying to implement the standard we described here to aid our discussion. To have some example data, I downloaded a random trajectory from the internet. This trajectory has a non-standard amino acid in it. How do you suggest we handle this case? It seems a one letter code is not sufficient to describe all amino acids in a sequence.

JPBergsma avatar May 19 '22 17:05 JPBergsma

Usually non-standard aminoacids are tagged as 'X' in the one letter code.

d-beltran avatar May 20 '22 09:05 d-beltran

Yes, I can do that. It would make it harder to search for sequences with non-standard amino acids, but those are probably quite rare anyway.

JPBergsma avatar May 23 '22 10:05 JPBergsma