fusioncatcher icon indicating copy to clipboard operation
fusioncatcher copied to clipboard

Missing 5'- or 3'- genes in fusion prediction

Open asmitagpta opened this issue 2 years ago • 1 comments

Hi all,

I am using the latest fusioncatcher build from github for fusion prediction in a set of solid tumor samples. However, the results in the summary table include many fusions (classified as readthrough mostly), which either don't contain 5'- gene or a 3'- gene. I am unable to understand how are the fusions getting detected or classified at all in such cases. Moreover, are all of these fusions putative false positives? How should I interpret the results? This is happening across almost all samples. Here is an example of the results obtained -

final_fusion_list.txt

    DTX2P1-UPK3BP1-PMS2P11  pseudogene,lncrna,m2,reciprocal 130     46      12      50      BOWTIE+STAR     7:73077342:+    7:77023677:+    ENSG00000277125 ENSG00000265479               CTTGGAGGACAACGTGATCACTGTATTCAGCTCTGTCAAGAATGGTCCAG*GTTCTTCTAGATGATCTGCACAAATGGTTCCTCTCCTCCTTCCTGATGTC   exonic(no-known-CDS)/exonic(no-known-CDS)

    HOXB9   adjacent,lncrna,1K<gap<10K,readthrough,exon-exon        0       22      6       26      BOWTIE;BOWTIE+BLAT;BOWTIE+BOWTIE2;BOWTIE+STAR   17:48639086:-   17:48623135:-   ENSG00000272763 ENSG00000170689 ENSE00002580757 ENSE00001193282 TATGGAGAACTTCAAGGCCTCCTCCTGGCTGCCCAGGAAGTAG*CCAACCCCTCCGCCAACTGGCTGCACGCTCGCTCTTCCCGGAA exonic(no-known-CDS)/CDS(truncated)

corresponding entries from summary file

--DTX2P1-UPK3BP1-PMS2P11 (reciprocal fusion)

--HOXB9 (readthrough)

I am also attaching a complete final list and summary list for a sample. These kind of predictions were completely absent from previous fusioncatcher predictions summary_candidate_fusions.txt final-list_candidate-fusion-genes.txt

asmitagpta avatar May 24 '22 04:05 asmitagpta

Hi @asmitagpta

as far as I understand you use an alpha release of FusionCatcher which looks like it has a bug where gene names are missing. This bug originates due to using a alpha release of FusionCatcher which uses a very new release of Ensembl database from the where the gene names are extracted and there for some reason the Ensembl database is missing the gene names. This is something that I have to fix.

Behind the curtains FusionCatcher does not work with gene names and it works with ensembl gene id (eg. ENSG00000100330) and therefore FusionCatcher finds correctly the fusion genes just that when it converts ensembl gene id to gene names it kind of fails.

This means in your case that the file summary_candidate_fusions.txt is not really useful. The file final-list_candidate-fusion-genes.txt is more useful because there are the ensembl gene ids that form the fusions.

I do not think that a fusion should be disqualified because is missing the gene name (even that it has the ensembl gene id) but it might indicate that the information about the gene where the gene name is missing is not stable and it changing from a year to another year.

ndaniel avatar Jun 01 '22 09:06 ndaniel