cufflinks icon indicating copy to clipboard operation
cufflinks copied to clipboard

Cuffdiff bug: FPKM is way above range conf_lo to conf_hi (and is wrong)

Open sammyjava opened this issue 7 years ago • 1 comments

I've run Cuffdiff v2.1.1 (4046M) on DNAnexus for an experiment a couple of times, keeping or removing other conditions, to see if this is reproducible and it is. Here's a relevant genes_fpkm_tracking line:

tracking_id	class_code	nearest_ref_id	gene_id	gene_short_name	tss_id	locus	length	coverage
XLOC_073347	 -	-	XLOC_073347	EPlOSAG00000014290,OS08G0184300	TSS78584,TSS78585	8:4947262-4952194	-	-	
OSH1-0_FPKM	OSH1-0_conf_lo	OSH1-0_conf_hi	OSH1-0_status
150.955		107.801		194.131		OK
OSH1-3_FPKM	OSH1-3_conf_lo	OSH1-3_conf_hi	OSH1-3_status
4587.81		99.1039		291.232		OK
OSH1-6_FPKM	OSH1-6_conf_lo	OSH1-6_conf_hi	OSH1-6_status
105.106		79.2116		130.946		OK
OSH1-24_FPKM	OSH1-24_conf_lo	OSH1-24_conf_hi	OSH1-24_status
71.4966		53.5662		89.4439		OK

Note that OSH1-3 reports FPKM=4587.81 while conf=99.1039 to 291.232. Looking at the OSH1-3 BAMs (2 reps per condition) in a JBrowse (see below), there's no question that the conf range is correct while the FPKM value is errant (and results in highly significant DE for OSH1-3 vs OSH1-0,6,24). There is no DE at this locus.

So clearly this is a bug. Cuffdiff reports a very high FPKM value for this particular condition, and it's reproducible. Scanning over the data, it looks like it happened at about 30 genes out of 35584 (rice). Cufflinks (v2.2.1 linked against Boost version 104700) did NOT report these high FPKM values; only Cuffdiff did for the composites.

I'll note that another run with the same compile of Cuffdiff as Cufflinks mentioned above (same version, though) resulted in the conf values being high as well (fpkm=4588.62 conf_lo=4552.9 conf_hi=4624.35). But, again, there are no reads to back up this large FPKM value. You can see the reads on this locus here:

see it on a JBrowse

Any ideas where I should look to see what happened? This is rather concerting!

sammyjava avatar Nov 23 '17 17:11 sammyjava

Follow-up: running the same Linux binary (cufflinks-2.2.1.Linux_x86_64) downloaded from here, on a totally different system (Carnegie Memex rather than DNAnexus/Amazon EC) this bug does NOT reproduce:

tracking_id	class_code	nearest_ref_id	gene_id	gene_short_name	tss_id	locus	length	coverage
OS08G0184300    -       -       OS08G0184300    -       -       8:4947262-4952194       -       -
OSH1-0_FPKM	OSH1-0_conf_lo	OSH1-0_conf_hi	OSH1-0_status
150.438		112.466		188.409		OK
OSH1-3_FPKM	OSH1-3_conf_lo	OSH1-3_conf_hi	OSH1-3_status
128.515		95.8528		161.177		OK
OSH1-6_FPKM	OSH1-6_conf_lo	OSH1-6_conf_hi	OSH1-6_status
103.632		77.2523		130.012		OK
OSH1-24_FPKM	OSH1-24_conf_lo	OSH1-24_conf_hi	OSH1-24_status
70.2417		51.8599		88.6235		OK

And, therefore, the corresponding DE call is negative:

test_id		gene_id		gene	locus			sample_1	sample_2	
OS08G0184300	OS08G0184300	-       8:4947262-4952194	OSH1-0		OSH1-3
status	value_1	value_2
OK	150.438	128.515
log2(fold_change)	test_stat	p_value	q_value		significant
-0.227229		-0.879438	0.20745	0.717647	no

So it's rather mysterious, depending on the platform!

sammyjava avatar Nov 28 '17 19:11 sammyjava