arriba
arriba copied to clipboard
Can one read be used to support multiple fusions
Hi @suhrig ,
I was running Arriba on this simulation dataset :
https://data.broadinstitute.org/Trinity/CTAT_FUSIONTRANS_BENCHMARKING/on_simulated_data/sim_101 (In the case I'm about to mention it's reads from sim3
.) I used the default script of Arriba_v2.4.0 and found these two fusions in fusions.tsv:
ISCA1 SLC30A1 -/- -/- 9:88886922 1:211749631 CDS/splice-site CDS/splice-site translocation 299 289 152 1732 4155 high in-frame . Iron-sulphur_cluster_biosynthesis(58%)|Cation_efflux_family(52%) . . ENSG00000135070.9 ENSG00000170385.9 ENST00000375991.4 ENST00000367001.4 upstream downstream duplicates(11),low_entropy(3),mismatches(209),multimappers(162)
CTD-2561J22.1 SLC30A1 -/- -/- 19:21564171 1:211749631 exon CDS/splice-site translocation 3 247 1 10 4155 high . . |Cation_efflux_family(52%) . . ENSG00000268731.1 ENSG00000170385.9 ENST00000597319.1 ENST00000367001.4 upstream downstream duplicates(6),mismatches(100),multimappers(253)
Between these two fusions, there are 166 identical read identifiers, for example: ISCA1|ENSG00000135070.9--SLC30A1|ENSG00000170385.9:10030480_1_76430_4501_261 ISCA1|ENSG00000135070.9--SLC30A1|ENSG00000170385.9:1010524_1_76430_4597_166 ISCA1|ENSG00000135070.9--SLC30A1|ENSG00000170385.9:10424315_1_76430_4580_164 ISCA1|ENSG00000135070.9--SLC30A1|ENSG00000170385.9:10462442_1_76430_4592_195 ISCA1|ENSG00000135070.9--SLC30A1|ENSG00000170385.9:10479657_1_76430_4532_241
According to Arriba's multimappers filter (I assume these reads are multi-mapping reads), "multi-mapping reads are reduced to a single alignment", each read should be mapped to only one fusion. Thus I wonder if my understanding for this filter is correct or if there's something wrong with the output.
Thanks for your time!