fusioncatcher icon indicating copy to clipboard operation
fusioncatcher copied to clipboard

Incorrect fusion point coordinate with BOWTIE+SPOTLIGHT

Open maressyl opened this issue 5 years ago • 4 comments

I am currently working with a series of human samples in which the fusion between BCL2 (chr18 ~63 Mbp) and IGH@ (chr14 ~105 Mbp) is recurring and pertinent. It appears however that every fusion reported by BOWTIE+SPOTLIGHT comes with invalid coordinates for the breakpoint in IGH@, which looks like there was a mixing between the two genes (typically 14:63187194:- which corresponds to an intergenic zone very far from IGH@). Fusions identified by other tools all come with consistent coordinates. The fusion sequence provided seems legit as well, as a manual BLAT maps it back to IGH@, it seems the problem is only with the numeric coordinates reported (which unfortunately I rely on for downstream analysis).

An extract of the final-list_candidate-fusion-genes.txt file can be found here : IGH_BCL2_600023.txt

I am using FusionCatcher version 1.00, with provided human annotation (human_v90).

maressyl avatar Apr 04 '19 13:04 maressyl

Thanks for reporting the bug!

Indeed the IGH@-BCL2 fusions are very interesting.

Looking to attached files which looks something like this:

BCL2-IGH@   BOWTIE+SPOTLIGHT   18:63126312:- 14:63187169: GCCCTCCTGCCCTCCTTCCGCTCCAG*GGGGAAACTGGACGTCTGGG
BCL2-IGH@   BOWTIE+STAR   18:63126317:-   14:105863239:-TGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGC*NNNNNNNNNNNNNNNNNNNNNTGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGTAAGAAT

there are two fusion junctions reported. Is the one reported by BOWTIE+STAR correct?

So I would say that if the same fusion is reported by BOWTIE+STAR (BOWTIE+BLAT, or BOWTIE) and BOWTIE-SPOTLIGHT then I would trust more the BOWTIE-STAR because it was able to recover the whole fusion junction sequence (the BOWTIE-SPOTLIGTH is never able to recover fully the fusion junction sequence and this can be seen by the Ns inserted at the fusion point ...CCTTCCGC*NNNNNNNNNNNNNNNNNNNNNTGGACG...).

ndaniel avatar Apr 04 '19 14:04 ndaniel

Hi Daniel, thanks for your fast answer.

The BOWTIE+STAR fusion seems correct, however we don't have validation data yet.

It seems indeed the two breakpoints are very similar (BLAT alignment of the two fusion sequences reported by FusionCatcher below), at least for this sample I checked. The N are reported by STAR though, it seems SPOTLIGHT was the more accurate here.

IGH

I think we will go on with STAR results, but if you ever find the time to fix the problem it can only make FusionCatcher better ! Very nice tool anyway, I come back to it every time the question of fusion protein arises.

maressyl avatar Apr 05 '19 10:04 maressyl

Hi,

Glad to hear that BOWTIE+STAR might be correct. Indeed, I will try to fix this. Also indeed you are right, Ns are when BOWTIE+STAR is used. And indeed BOWTIE+SPOTLIGHT is much more accurate. I got confused

Probably, labeling the fusions with some kind of flag which would indicate which fusion can be trusted more, like for example in this case BOWTIE+SPOTLIGTH should be (more) trusted and not BOWTIE+STAR_with_Ns for same fusion gene. I need to think about this.

ndaniel avatar Apr 05 '19 10:04 ndaniel

@ndaniel I also got confused:

$ grep IGH final-list_candidate-fusion-genes.txt  |grep NSD2
IGH@    NSD2    known,oncogene,chimerdb2,cancer,tumor,m4,mitelman       0       7       4       112     BOWTIE+SPOTLIGHT        14:106543230:+4:106530738:+    ENSG09000000017 ENSG00000109685                 TTTCTGTGTCTGACTCATTTCACTCAACATAGTGGTGTCCCATTCCATCCATGTTGCTTTAA*AGTTGCTTGATTTTAGTGGCTCAGAACTGCAATTTAATTCTTGTTCTTTGCACCTCTCTCTCCACCCCTTCTTTAACTTTTTGTTAGGGATTCACTCATGTTTCGTGTGTAA        ---/intergenic
IGH@    NSD2    known,oncogene,chimerdb2,cancer,tumor,m4,mitelman       0       7       3       36      BOWTIE+BLAT;BOWTIE+STAR 14:106543230:+4:1953237:+      ENSG09000000017 ENSG00000109685                 ACTCATTTCACTCAACATAGTGGTGTCCCATTCCATCCATGTTGCTTTAA*AGTTGCTTGATTTTAGTGGCTCAGAACTGCAATTTAATTCTTGTTCTTTG  ---/CDS(truncated)

The first result 3-end(4:106530738:+) fasta is not consistent with the ucsc genome fasta. Which result should I choose for IGH@-NSD2 fusion event?

$bedtools getfasta -s -fi ucsc.hg38.fasta -bed  NSD2.bed -fo  test.fa
>chr4:106530738-106531748(+)
AAGCTCACATAGAGGCATATATTTTCTTAAAGAAGTCATGTTAGAATCATTTTGCTAAAATATCACATCTGTCTTACAGAAATTACAACAGAAAAGCAACAAGATTACAAAATCATTTCAAAATAAATATTGTACCAACCTAATGGAAAAACATCTGTGCTGTTTTAGCTTATTACAGCAGTTTGTCATGACAATTTATAGATAAGTATGCAATTTTTCTTAGATTTATTATTGGAAACTTCTACTTATAGGTCTGTGGTTTAAGACAGGAAATGTGTTTTGATTGATTATCCTGAAAAGAAATTGGTAGGCCAAATGTTGAAACCATGAAAACATTAGGAGATGAAGTAACAGTTGAAAAAATGAACTGCACTTTGGAAGAACTCACAGGAGTAGTGATGATTGTACCACTAAATCTTGTGCTTGTTTACCAAAGTTGAGAGAGAAAAATTTCAGTTGAACTAGTATGTAGAGTGATACTCTGGTTTCTCTTGAGATGGTATAAATCTTTTGCCTTTTACTAGTATGTAGAGTGATATGAGAATTCCCTCTAATAGGCAAGTGAATGGAGAGTAGATAATATAACAACTGAGAAAATATAATCCTTAACCCTTCAAAAGAAAATATTTCTTCAAGAAGTGGGTACAAGCAAAGCAATTCAATAAATTAGAAATACATTATGATTATAAGGACAGTGCATTTAGAGTTCCCAATGACCTGATTTAACTATTAAAAATATACACAGGCAAACCATACACTAAATCATAATAGCCGATTTTAGAATTCCTCTCAGGAAGAGCACTGTCTTAGCTTGAGTGGCTGTAGATAGTAAGATCCTTAGGATATGGGTTGGTAGGCACAAAGAATAAGTACATCTTCATATAATAACAAACAAGCATGTAACAAACAGGAATTTGGTAATTGAGAGGACAGCAGAAATGAGAGAAAAGAGAAAGTAAGAGAAAGTTGAATTACATAGTGAAATTCTGGAGCCCAAATTGGGGAAAAGA

xiucz avatar Dec 25 '19 07:12 xiucz