cufflinks
cufflinks copied to clipboard
Missed transcripts in cuffcompare result
I used cuffcompare (v 2.2.1) to compare two sets of trasnctipts.
cuffcompare -r $ref_gtf -s $genome -o $OUT A.gtf B.gtf
But I found the row number of result XXX.tracking is less than the number of transcript numbers in $ref_gtf, is some transcripts lost due to some reasons?
This issue has appeared in Cufflinks, too. Our lab solves this problem by setting --max-bundle-length 4500000 so that it exceeds the largest transcript size in ENSEMBL gtf file. But we haven't played with Cuffcompare yet.
Thanks, but there is no --max-bundle-length
option in cuffcompare. Even if I added -C and -G, the output number is less than the $ref_gtf number.
-C include the "contained" transcripts in the .combined.gtf file -F do not discard intron-redundant transfrags if they share the 5' end (if they differ only at the 3' end)) -G generic GFF input file(s): do not assume Cufflinks GTF, do not discard any intron-redundant transfrags)
The *.tracking file is not supposed to show all the reference transcripts, its purpose is to track assembled transcripts across samples (i.e. transcripts from A.gtf and B.gtf in your case). There are a lot of reference transcripts which are not expressed in any of the samples so there would be no reason to list them in the .tracking file.. The manual page at http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/#cuffcompare-output-files explains the purpose of the *.tracking file. The *.refmap file is the one that should list all the reference transcripts loaded by cuffcompare and how they relate to "query" transcripts (i.e. assembled transfrags). The only case when cuffcompare may discard some reference transcripts is when they are duplicates (the same exact intron chain) -- only the "longer one" will be kept in that case.. But that should not happen often (or at all) in a reference annotation file, unless it's built by plain concatenation of multiple annotation files from different sources.
@gpertea Thanks. But actually the number of transcripts in *.refmap is less than that in $ref_gtf.