cufflinks icon indicating copy to clipboard operation
cufflinks copied to clipboard

Missed transcripts in cuffcompare result

Open tianfree opened this issue 8 years ago • 4 comments

I used cuffcompare (v 2.2.1) to compare two sets of trasnctipts.

cuffcompare -r $ref_gtf -s $genome -o $OUT A.gtf B.gtf

But I found the row number of result XXX.tracking is less than the number of transcript numbers in $ref_gtf, is some transcripts lost due to some reasons?

tianfree avatar May 20 '16 02:05 tianfree

This issue has appeared in Cufflinks, too. Our lab solves this problem by setting --max-bundle-length 4500000 so that it exceeds the largest transcript size in ENSEMBL gtf file. But we haven't played with Cuffcompare yet.

brianpenghe avatar May 20 '16 02:05 brianpenghe

Thanks, but there is no --max-bundle-length option in cuffcompare. Even if I added -C and -G, the output number is less than the $ref_gtf number.

-C include the "contained" transcripts in the .combined.gtf file -F do not discard intron-redundant transfrags if they share the 5' end (if they differ only at the 3' end)) -G generic GFF input file(s): do not assume Cufflinks GTF, do not discard any intron-redundant transfrags)

tianfree avatar May 20 '16 03:05 tianfree

The *.tracking file is not supposed to show all the reference transcripts, its purpose is to track assembled transcripts across samples (i.e. transcripts from A.gtf and B.gtf in your case). There are a lot of reference transcripts which are not expressed in any of the samples so there would be no reason to list them in the .tracking file.. The manual page at http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/#cuffcompare-output-files explains the purpose of the *.tracking file. The *.refmap file is the one that should list all the reference transcripts loaded by cuffcompare and how they relate to "query" transcripts (i.e. assembled transfrags). The only case when cuffcompare may discard some reference transcripts is when they are duplicates (the same exact intron chain) -- only the "longer one" will be kept in that case.. But that should not happen often (or at all) in a reference annotation file, unless it's built by plain concatenation of multiple annotation files from different sources.

gpertea avatar May 20 '16 03:05 gpertea

@gpertea Thanks. But actually the number of transcripts in *.refmap is less than that in $ref_gtf.

tianfree avatar May 20 '16 05:05 tianfree