gatk icon indicating copy to clipboard operation
gatk copied to clipboard

several genes are reported in "PREDICTED_LOF" for a balanced translocation

Open Nehir291 opened this issue 1 year ago • 3 comments

bug report

I have a balanced translocation (CTX) case and it has several genes (475 genes) under PREDICTED_LOF. However, one CTX breakpoint hits an intron of a gene, whereas the other breakpoint is intergenic. How this CTX is calculated as PREDICTED_LOF?

Thanks.

Nehir291 avatar May 30 '24 23:05 Nehir291

Gene that CTX hits the intron must be broken in the midst of its coding frame (exon continuation is broken) therefore PREDICTED_LOF should be justified for that matter. If both breakpoints were in the intergenic region then PREDICTED_LOF assessment might have been problematic.

gokalpcelik avatar Jun 03 '24 19:06 gokalpcelik

Thanks for the response and clarifying that. The breakpoints are not overlapping at any exon (one bp is intergenic, the other is in intronic site). I still don't see how so many genes have been computed under PREDCITED_LOF.

Nehir291 avatar Jun 03 '24 20:06 Nehir291

Hi @Nehir291, please refer to the SVAnnotate tool documentation for definitions of each annotation: https://gatk.broadinstitute.org/hc/en-us/articles/21905053774363-SVAnnotate. For translocations, PREDICTED_LOF is assigned if a breakpoint falls at any point in the transcript, so an intronic breakpoint would still be annotated as PREDICTED_LOF for any impacted genes. This is because only part of the gene exists on each chromosome after the translocation, which is likely to result in a truncated transcript subject to nonsense-mediated decay.

Perhaps in some cases this definition is overly permissive, such as if only the UTR or one shorter exon is removed by the translocation - those cases could be worth revisiting.

I hope this helps explain the behavior you are seeing. If this does not fully explain all the PREDICTED_LOF annotations you are observing, please share the CTX breakpoints, the annotations, the SVAnnotate version, and the GTF used so we can investigate further.

epiercehoffman avatar Jun 27 '24 15:06 epiercehoffman

Update: after some offline discussion, we found that this was a two-fold issue that has since been fixed.

  1. END was set to END2 in some older VCFs from GATK-SV, so it represented the breakpoint on the second chromosome rather than the first. This has been fixed - multiple more recent GATK-SV VCFs were found to have the correct values for END for CTX events
  2. Older versions of SVAnnotate annotated the interval CHROM:POS-END for CTX, expecting END to be very close to POS. This produced incorrect intervals when END was set to END2, which could be very large, resulting in long lists of genes under PREDICTED_LOF. This has been fixed in #8693 so SVAnnotate now independently annotates breakpoints at CHROM:POS and CHROM:END for CTX

For other users encountering this issue in their VCFs produced by older versions of GATK-SV, I recommend rerunning CleanVcf and AnnotateVcf with the latest versions of GATK-SV. A more manual alternative that requires less re-running of workflows would be:

  1. Extract CTX SVs
  2. Set END to POS
  3. Strip out functional consequence annotations
  4. Re-annotate with the latest version of SVAnnotate.

epiercehoffman avatar Oct 03 '24 19:10 epiercehoffman