model-angelo
model-angelo copied to clipboard
Include undetermined amino acids and handle gaps
Hello, thanks for your excellent work!
I have some questions about the residues you excluded and included in the protein sequence. I found this code:
def remove_non_residue(sequence: str) -> str:
return "".join([s for s in sequence if s in "ARNDCQEGHILKMFPSTWYVU"])
- You included 21 amino acids. Why did you include Selenocysteine (U) - a rare amino acid but not Pyrrolysine (O), also a rare amino main?
- Why didn't you take into account the letter
B
(Aspartic acid | Asparagine),J
(Leucine | Isoleucine),Z
(Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length. - Why didn't you take into account letter
X
of undetermined residue? - Besides, we have
-
to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?
Thank you for your time!