AnnotSV icon indicating copy to clipboard operation
AnnotSV copied to clipboard

Question about frameshift annotation in AnnotSV

Open BOBOL513 opened this issue 6 months ago • 3 comments

Hi, I'm a beginner in bioinformatics and I have some confusion about how AnnotSV annotates frameshift events. It seems that all insertions occurring in exon-CDS regions are annotated as frameshift = yes. However, some of these insertions have an SV length that is a multiple of 3, which theoretically should not cause a frameshift mutation.

For insertions in exon-UTR, intron-CDS, and intron-UTR regions, AnnotSV annotates them as frameshift = no. Since these regions are not involved in protein coding, frameshift mutations cannot occur there. In this case, annotating them as NA instead of no might be more appropriate.

Could you please clarify how AnnotSV defines and assigns the frameshift annotation, and whether these observations are expected or might be an issue?

Thank you very much for your help!

BOBOL513 avatar Sep 26 '25 03:09 BOBOL513

Thank you for your interest in AnnotSV and your bug report.

Could you please clarify how AnnotSV defines and assigns the frameshift annotation,

Currently, AnnotSV measures the length of the overlapped CDS and indicates whether this length is a multiple of 3.

Image

=> This is indeed very incomplete and inappropriate in some cases.

I'll add a fix as soon as possible (and add "NA" as you recommended).

lgmgeo avatar Sep 26 '25 09:09 lgmgeo

New rules considered:

  • Deletions (DEL) / Duplications (DUP):

    • Within the CDS
      • If the number of nucleotides overlapped is a multiple of 3 → reading frame preserved (Frameshift = no)
      • If the number of nucleotides overlapped is not a multiple of 3 → reading frame disrupted (Frameshift = yes)
    • Outside the CDS
      • Frameshift = NA
  • Insertions (INS)

    • Within the CDS
      • If the insertion length is unknown → Frameshift = NA
      • If the insertion length is a multiple of 3 → reading frame preserved (Frameshift = no) Note: Regardless of length, the inserted sequence itself may introduce additional effects (e.g. a premature stop codon) => Should I rather indicate "Frameshift = NA"?
      • If the insertion length is not a multiple of 3 → reading frame disrupted (Frameshift = yes).
    • Outside the CDS
      • Frameshift = NA
  • Translocations (TRA), Inversions (INV), and other SV types Frameshift status cannot be determined by length alone → Frameshift = NA These are simplified rules, in practice such SV can still disrupt coding sequences (e.g. by splitting exons, changing orientation, or creating fusion transcripts).

Please, can you share me your feelings about all of that?

lgmgeo avatar Sep 26 '25 12:09 lgmgeo

Thank you for sharing the proposed rules for determining frameshift status. I find them clear and biologically consistent, especially for deletions and duplications where the frame depends on whether the affected length is a multiple of three.

For insertions, I agree that using “Frameshift = NA” is the most cautious approach when the inserted sequence is unknown or could introduce premature stop codons. Similarly, assigning “Frameshift = NA” to translocations, inversions, and other complex SVs is appropriate, as their effects cannot be inferred from length alone.

BOBOL513 avatar Oct 11 '25 01:10 BOBOL513