OPTIMADE
OPTIMADE copied to clipboard
New biomol fields
This is the third option on how to include biomolecular data within Optimade. This has been discussed in issue 389.
It introduces two new main fields: biomol_chains and biomol_residues. These fields describe how atoms are grouped in "chains" and "residues", two classifiers widely used in the biomolecular field.
In addition two more fields are suggested: biomol_sequences and biomol_sequence_types. These fields describe sequences of residues and they are useful for queries.
New fields are placed in the appendix, as @JPBergsma did in previous options PR395 and PR396
I just came across another issue. I am trying to implement the standard we described here to aid our discussion. To have some example data, I downloaded a random trajectory from the internet. This trajectory has a non-standard amino acid in it. How do you suggest we handle this case? It seems a one letter code is not sufficient to describe all amino acids in a sequence.
Usually non-standard aminoacids are tagged as 'X' in the one letter code.
Yes, I can do that. It would make it harder to search for sequences with non-standard amino acids, but those are probably quite rare anyway.