cufflinks icon indicating copy to clipboard operation
cufflinks copied to clipboard

Cufflinks SAM format incompatibility when using STAR-aligned SAM/BAM

Open schelhorn opened this issue 8 years ago • 7 comments

There seems to be an acknowledged incompatibility of the most recent cufflinks 2.2.1 that makes it incompatible to certain (standard-compliant) SAM inputs. Specifically, one has to expect the critical Error (GFaSeqGet): subsequence cannot be larger than 16569 from cufflinks if bias modeling is turned on (as it should). This error results from the inability of cufflinks to model softclipped bases that extend over the end of the chromosome (a circumstance that is not forbidden by the SAM specs).

Since the RNA-Seq aligner STAR sometimes generates such outputs if a read best fits to the chromosomal end (this can happen in general and specifically in cancer genomes due to chromosomal rearrangements), there are two ways to post-process STAR BAMs to cut off the ends and make cufflinks eat the input file. Still, I would like to avoid generating yet another copy of a BAM file just for cufflinks since other quantitation methods deal with softclipped bases just fine and the error seems to be with cufflinks, so that's why I'd appreciate a fix from your end. The linked mail thread contains additional information and SAM examples. Thank you.

This issue is co-tracked in bcbio.

schelhorn avatar Sep 14 '15 08:09 schelhorn

Thank you for reporting this, I will look into a possible fix for what looks like a mishandling of soft-clipping.This probably happens because Cufflinks was designed (and tested) to work with TopHat's output which only provides end-to-end alignments..

gpertea avatar Sep 14 '15 18:09 gpertea

Excellent, thanks for looking into it - especially since cufflinks now is a legacy package. We probably will move over to stringtie once that is sufficiently feature-complete and stable (so hopefully that one will work with STAR output, too), but for the meantime our analyses will continue to rely on cufflinks.

schelhorn avatar Sep 15 '15 08:09 schelhorn

I'm not sure who told you Cufflinks is a "legacy" package. We expect to add new features as the need arises, particularly in support of new single-cell applications.

ctrapnell avatar Sep 15 '15 14:09 ctrapnell

Then that was an erroneous misconception of mine; it's just that most of the code base is 1-2y old and HISAT+stringtie seemed to be well placed as successors to tophat2+cufflinks. I realize that these projects are under different management, but I just wasn't aware if there was a vision forward for cufflinks. If there is then I am glad about it and am looking forward to further features and performance gains.

schelhorn avatar Sep 16 '15 08:09 schelhorn

Thank you very much for this timely commit, @gpertea. Would you recommend testing it as a likely fix to this issue, or would you suggest users to better wait for an official release?

schelhorn avatar Sep 28 '15 11:09 schelhorn

Yes, this patch should be OK for testing, I am quite confident it is a likely fix for the soft-clipping problem reported here. I'm just hoping it doesn't introduce other alignment issues -- admittedly I haven't tested it properly myself, only checked it on the small data set referenced in the rna-star forum which exposed the problem.

gpertea avatar Sep 28 '15 13:09 gpertea

This seems to be an issue with GSNAP as well. Can the fix now be considered well-tested? Are there plans to merge it with master?

alephreish avatar Mar 29 '16 15:03 alephreish