fusioncatcher
fusioncatcher copied to clipboard
Incorrect fusion point coordinate with BOWTIE+SPOTLIGHT
I am currently working with a series of human samples in which the fusion between BCL2 (chr18 ~63 Mbp) and IGH@ (chr14 ~105 Mbp) is recurring and pertinent. It appears however that every fusion reported by BOWTIE+SPOTLIGHT comes with invalid coordinates for the breakpoint in IGH@, which looks like there was a mixing between the two genes (typically 14:63187194:-
which corresponds to an intergenic zone very far from IGH@). Fusions identified by other tools all come with consistent coordinates. The fusion sequence provided seems legit as well, as a manual BLAT maps it back to IGH@, it seems the problem is only with the numeric coordinates reported (which unfortunately I rely on for downstream analysis).
An extract of the final-list_candidate-fusion-genes.txt
file can be found here : IGH_BCL2_600023.txt
I am using FusionCatcher version 1.00, with provided human annotation (human_v90).
Thanks for reporting the bug!
Indeed the IGH@-BCL2 fusions are very interesting.
Looking to attached files which looks something like this:
BCL2-IGH@ BOWTIE+SPOTLIGHT 18:63126312:- 14:63187169: GCCCTCCTGCCCTCCTTCCGCTCCAG*GGGGAAACTGGACGTCTGGG
BCL2-IGH@ BOWTIE+STAR 18:63126317:- 14:105863239:-TGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGC*NNNNNNNNNNNNNNNNNNNNNTGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGTAAGAAT
there are two fusion junctions reported. Is the one reported by BOWTIE+STAR correct?
So I would say that if the same fusion is reported by BOWTIE+STAR (BOWTIE+BLAT, or BOWTIE) and BOWTIE-SPOTLIGHT then I would trust more the BOWTIE-STAR because it was able to recover the whole fusion junction sequence (the BOWTIE-SPOTLIGTH is never able to recover fully the fusion junction sequence and this can be seen by the Ns inserted at the fusion point ...CCTTCCGC*NNNNNNNNNNNNNNNNNNNNNTGGACG...
).
Hi Daniel, thanks for your fast answer.
The BOWTIE+STAR fusion seems correct, however we don't have validation data yet.
It seems indeed the two breakpoints are very similar (BLAT alignment of the two fusion sequences reported by FusionCatcher below), at least for this sample I checked. The N
are reported by STAR though, it seems SPOTLIGHT was the more accurate here.
I think we will go on with STAR results, but if you ever find the time to fix the problem it can only make FusionCatcher better ! Very nice tool anyway, I come back to it every time the question of fusion protein arises.
Hi,
Glad to hear that BOWTIE+STAR might be correct. Indeed, I will try to fix this. Also indeed you are right, Ns are when BOWTIE+STAR is used. And indeed BOWTIE+SPOTLIGHT is much more accurate. I got confused
Probably, labeling the fusions with some kind of flag which would indicate which fusion can be trusted more, like for example in this case BOWTIE+SPOTLIGTH should be (more) trusted and not BOWTIE+STAR_with_Ns for same fusion gene. I need to think about this.
@ndaniel I also got confused:
$ grep IGH final-list_candidate-fusion-genes.txt |grep NSD2
IGH@ NSD2 known,oncogene,chimerdb2,cancer,tumor,m4,mitelman 0 7 4 112 BOWTIE+SPOTLIGHT 14:106543230:+4:106530738:+ ENSG09000000017 ENSG00000109685 TTTCTGTGTCTGACTCATTTCACTCAACATAGTGGTGTCCCATTCCATCCATGTTGCTTTAA*AGTTGCTTGATTTTAGTGGCTCAGAACTGCAATTTAATTCTTGTTCTTTGCACCTCTCTCTCCACCCCTTCTTTAACTTTTTGTTAGGGATTCACTCATGTTTCGTGTGTAA ---/intergenic
IGH@ NSD2 known,oncogene,chimerdb2,cancer,tumor,m4,mitelman 0 7 3 36 BOWTIE+BLAT;BOWTIE+STAR 14:106543230:+4:1953237:+ ENSG09000000017 ENSG00000109685 ACTCATTTCACTCAACATAGTGGTGTCCCATTCCATCCATGTTGCTTTAA*AGTTGCTTGATTTTAGTGGCTCAGAACTGCAATTTAATTCTTGTTCTTTG ---/CDS(truncated)
The first result 3-end(4:106530738:+) fasta is not consistent with the ucsc genome fasta. Which result should I choose for IGH@-NSD2 fusion event?
$bedtools getfasta -s -fi ucsc.hg38.fasta -bed NSD2.bed -fo test.fa
>chr4:106530738-106531748(+)
AAGCTCACATAGAGGCATATATTTTCTTAAAGAAGTCATGTTAGAATCATTTTGCTAAAATATCACATCTGTCTTACAGAAATTACAACAGAAAAGCAACAAGATTACAAAATCATTTCAAAATAAATATTGTACCAACCTAATGGAAAAACATCTGTGCTGTTTTAGCTTATTACAGCAGTTTGTCATGACAATTTATAGATAAGTATGCAATTTTTCTTAGATTTATTATTGGAAACTTCTACTTATAGGTCTGTGGTTTAAGACAGGAAATGTGTTTTGATTGATTATCCTGAAAAGAAATTGGTAGGCCAAATGTTGAAACCATGAAAACATTAGGAGATGAAGTAACAGTTGAAAAAATGAACTGCACTTTGGAAGAACTCACAGGAGTAGTGATGATTGTACCACTAAATCTTGTGCTTGTTTACCAAAGTTGAGAGAGAAAAATTTCAGTTGAACTAGTATGTAGAGTGATACTCTGGTTTCTCTTGAGATGGTATAAATCTTTTGCCTTTTACTAGTATGTAGAGTGATATGAGAATTCCCTCTAATAGGCAAGTGAATGGAGAGTAGATAATATAACAACTGAGAAAATATAATCCTTAACCCTTCAAAAGAAAATATTTCTTCAAGAAGTGGGTACAAGCAAAGCAATTCAATAAATTAGAAATACATTATGATTATAAGGACAGTGCATTTAGAGTTCCCAATGACCTGATTTAACTATTAAAAATATACACAGGCAAACCATACACTAAATCATAATAGCCGATTTTAGAATTCCTCTCAGGAAGAGCACTGTCTTAGCTTGAGTGGCTGTAGATAGTAAGATCCTTAGGATATGGGTTGGTAGGCACAAAGAATAAGTACATCTTCATATAATAACAAACAAGCATGTAACAAACAGGAATTTGGTAATTGAGAGGACAGCAGAAATGAGAGAAAAGAGAAAGTAAGAGAAAGTTGAATTACATAGTGAAATTCTGGAGCCCAAATTGGGGAAAAGA