model-angelo icon indicating copy to clipboard operation
model-angelo copied to clipboard

Include undetermined amino acids and handle gaps

Open hnguyentt opened this issue 1 year ago • 0 comments

Hello, thanks for your excellent work!

I have some questions about the residues you excluded and included in the protein sequence. I found this code:

def remove_non_residue(sequence: str) -> str:
    return "".join([s for s in sequence if s in "ARNDCQEGHILKMFPSTWYVU"])
  1. You included 21 amino acids. Why did you include Selenocysteine (U) - a rare amino acid but not Pyrrolysine (O), also a rare amino main?
  2. Why didn't you take into account the letter B (Aspartic acid | Asparagine), J (Leucine | Isoleucine), Z (Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length.
  3. Why didn't you take into account letter X of undetermined residue?
  4. Besides, we have - to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?

Image from iOS

Thank you for your time!

hnguyentt avatar Feb 13 '24 17:02 hnguyentt