salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Gene fusions

Open schelhorn opened this issue 8 years ago • 26 comments

Wicked fast indeed! Are there any plans to extend salmon to also detect gene fusion events? There isn't a fast and accurate way to do that yet, only approaches requiring full alignments. Most often a base-perfect breakpoint isn't required, an estimate within a hash length is fine. We are a heavy user of bcbio and are also running the full STAR alignment just for gene fusions, which really sucks. Any ideas would be much appreciated.

schelhorn avatar Mar 28 '16 14:03 schelhorn

Hi @schelhorn,

Yes; we are actively looking at fusion prediction based on quasi-mapping. The initial results are promising, but we're still working on improving and refining the method. I'll be sure to let you know when we have something that is ready to test :).

Best, Rob

rob-p avatar Mar 28 '16 21:03 rob-p

Excellent. May I point out that tools such as Oncofuse https://github.com/mikessh/oncofuse/ and Pegasus https://github.com/RabadanLab/Pegasus have a particular, additional value since they provide functional annotation of fusion events identified by other approaches? Also, these resources may prove helpful wrt validation data: https://github.com/chapmanb/bcbio-nextgen/issues/210 and http://m.genome.cshlp.org/content/early/2015/11/10/gr.186114.114 Adding @roryk here for highlighting this feature request in bcbio.

schelhorn avatar Mar 29 '16 05:03 schelhorn

Awesome; thanks for the pointers! We'll definitely take a look at these.

rob-p avatar Mar 29 '16 15:03 rob-p

Hello @rob-p, may I ask whether there are any news concerning gene fusion detection in Salmon?

schelhorn avatar Oct 20 '16 13:10 schelhorn

Hi @schelhorn,

Yes, we have built a pipeline atop salmon and quasi-mapping. At this point, what we see is that it is very fast with high sensitivity. Our main focus has been on improving the specificity, which is current better than some, but not all methods. I realize, of course, that false-positives are a very difficult (and key) problem in this domain, so I'd really like to make sure they are well-handled.

rob-p avatar Oct 20 '16 14:10 rob-p

Great; would you like help testing the pipeline, and integrating it into bcbio? We could help with both :)

schelhorn avatar Oct 20 '16 14:10 schelhorn

Also, do you know if the Salmon pseudo-BAM is suitable for fusion calling by standard (alignment-based) fusion calling tools, ie does the BAM include information on mate pairs mapped across transcripts, or reads spanning breakpoints?

schelhorn avatar Oct 25 '16 09:10 schelhorn

Hi @schelhorn,

Sorry for the uncharacteristically slow response on this. We're going full steam ahead for the RECOMB deadline, so I've been less responsive than usual. Anyway, I've invited you to the repository for the fusion project (it's currently private). Feel free to poke around, but it's probably not useful until we can send you a short writeup describing the current pipeline (since things are still very "alpha"). Regarding calling fusions from the sam output of Salmon, one can't do this directly because there are, by default, no encompassing reads (i.e. individual reads split between transcripts) and, to improve abundance estimation, salmon is conservative with it's use of spanning reads. However, we can get at this information from quasi-mapping, so I can definitely consider adding some flags to provide this info (this is the type of thing we output in the fusion pipeline currently, and then we have to postprocess it).

rob-p avatar Oct 25 '16 14:10 rob-p

Excellent; thank you. We'll have a look and see what we can contribute.

schelhorn avatar Oct 28 '16 09:10 schelhorn

Hello @rob-p, could you please invite @tetianakh to the repo as well? She'll do the development on our end. Thanks!

schelhorn avatar Oct 28 '16 12:10 schelhorn

Hi @schelhorn,

Sure, I'll had her now. We'll send you a small write-up about the state of the codebase and how to run the current pipeline next week (once my student is back from the current CSHL meeting with all of the cool kids ;P).

rob-p avatar Oct 28 '16 14:10 rob-p

Sweet!

schelhorn avatar Oct 28 '16 18:10 schelhorn

Hi Rob,

Could I get in on this? We have a couple projects needing to call fusions on a large amount of samples, and it would be great to have something speedy to iterate on.

roryk avatar Nov 04 '16 12:11 roryk

FYI, I also asked in the kallisto project: https://github.com/pachterlab/kallisto/issues/122

schelhorn avatar Nov 04 '16 12:11 schelhorn

Hi @rob-p, I haven't received an invitation to the private repo. Could you please invite me? Thanks!

tetianakh avatar Nov 07 '16 13:11 tetianakh

Hi @tetianakh, I've re-sent the invitation. If you don't get it, please send me an e-mail, and I'll reply with the link to join directly.

rob-p avatar Nov 07 '16 14:11 rob-p

Thanks, I've received it now.

tetianakh avatar Nov 07 '16 14:11 tetianakh

Great :). I'll have @hiraksarkar write up a brief overview of the current state of the codebase (including which branch contains the latest stuff) this week. We can either share that information in the issues over at that repo, or we can e-mail you the write-up @schelhorn, @tetianakh and @roryk. Let me know if one method is preferable to the other.

rob-p avatar Nov 07 '16 15:11 rob-p

Great; directly in the repo is preferred.

schelhorn avatar Nov 07 '16 15:11 schelhorn

This sounds cool. Have you looked at submitting your method for the DREAM RNA-Seq analysis challenge ( https://synapse.org/SMC_RNA ) ?

kellrott avatar Feb 17 '17 18:02 kellrott

And any status updates? I'd be interested to test drive a quasi-mapping-based fusion caller!

nellore avatar Feb 17 '17 20:02 nellore

One fast way using pseudo-alignments should be Kallisto+[Manta|Pizzly], but I haven't tried that myself. We decided to go with full transcriptome alignments instead and integrated EricScript into bcbio. We'd still be interested in something more modern, though.

schelhorn avatar Feb 18 '17 07:02 schelhorn

If one has a downstream fusion pipeline that uses transcriptome mapping, you can already get those from the -z=<output.sam> option for a while. The real challenge is how to properly control the false positive rate. That's the main thing special purpose downstream software must solve.

rob-p avatar Feb 18 '17 07:02 rob-p

Thanks for the tips; I'll experiment.

nellore avatar Feb 18 '17 07:02 nellore

Hi @rob-p, We are working towards creating fusion calling pipeline based on Salmon/Pizzly. It would be helpful to see the current state of the repository and try to replicate some of the experiments we have done with it. We seem to be hitting good specificity but lagging a bit short on sensitivity. Thanks, Prateek

erprateek avatar Jul 31 '19 20:07 erprateek

Hello @rob-p! I was wondering if there have been any updates on the fusion/detection of spanning reads problem. I'm about to embark on a project to process many bacterial transcriptomes from many different genomes/species and plan to use salmon. I would love to be able to detect polycistronic transcripts through the identification of spanning reads.

taylorreiter avatar Jan 20 '22 17:01 taylorreiter