cufflinks
cufflinks copied to clipboard
Cufflinks SAM format incompatibility when using STAR-aligned SAM/BAM
There seems to be an acknowledged incompatibility of the most recent cufflinks 2.2.1
that makes it incompatible to certain (standard-compliant) SAM inputs. Specifically, one has to expect the critical Error (GFaSeqGet): subsequence cannot be larger than 16569
from cufflinks
if bias modeling is turned on (as it should). This error results from the inability of cufflinks
to model softclipped bases that extend over the end of the chromosome (a circumstance that is not forbidden by the SAM specs).
Since the RNA-Seq aligner STAR sometimes generates such outputs if a read best fits to the chromosomal end (this can happen in general and specifically in cancer genomes due to chromosomal rearrangements), there are two ways to post-process STAR BAMs to cut off the ends and make cufflinks
eat the input file. Still, I would like to avoid generating yet another copy of a BAM file just for cufflinks
since other quantitation methods deal with softclipped bases just fine and the error seems to be with cufflinks
, so that's why I'd appreciate a fix from your end. The linked mail thread contains additional information and SAM examples. Thank you.
This issue is co-tracked in bcbio.
Thank you for reporting this, I will look into a possible fix for what looks like a mishandling of soft-clipping.This probably happens because Cufflinks was designed (and tested) to work with TopHat's output which only provides end-to-end alignments..
Excellent, thanks for looking into it - especially since cufflinks
now is a legacy package. We probably will move over to stringtie
once that is sufficiently feature-complete and stable (so hopefully that one will work with STAR
output, too), but for the meantime our analyses will continue to rely on cufflinks
.
I'm not sure who told you Cufflinks is a "legacy" package. We expect to add new features as the need arises, particularly in support of new single-cell applications.
Then that was an erroneous misconception of mine; it's just that most of the code base is 1-2y old and HISAT
+stringtie
seemed to be well placed as successors to tophat2
+cufflinks
. I realize that these projects are under different management, but I just wasn't aware if there was a vision forward for cufflinks
. If there is then I am glad about it and am looking forward to further features and performance gains.
Thank you very much for this timely commit, @gpertea. Would you recommend testing it as a likely fix to this issue, or would you suggest users to better wait for an official release?
Yes, this patch should be OK for testing, I am quite confident it is a likely fix for the soft-clipping problem reported here. I'm just hoping it doesn't introduce other alignment issues -- admittedly I haven't tested it properly myself, only checked it on the small data set referenced in the rna-star forum which exposed the problem.
This seems to be an issue with GSNAP as well. Can the fix now be considered well-tested? Are there plans to merge it with master?