TPMCalculator icon indicating copy to clipboard operation
TPMCalculator copied to clipboard

Meaning of "UniqueReads" in genes.out file?

Open DaRinker opened this issue 2 years ago • 2 comments

I have been comparing HTSeq (htseq-count) counts to TPMcalcualtor UniqueReads counts.

I find them generally to be in very good agreement. However, there are occasionally some cases where HTSeq and TPMcalculator will be in agreement for the "Reads" column of TPMcalculator output. but will disagree in the last 8 columns (TPMcalculator output will report zeros).

For example:

DTO2_FUN_009269 scaffold_6 3072673 3073284 612 366201 9037.3 612 366201 7579.98 0 0 0 0 0 0 0 0 0 0 0 0

This can happen with genes that receive just a few reads or with genes have 10's of thousands of reads aligning. It can happen with intronless genes or with multi-exon genes.

Since I'm using de novo assembly and annotation, I'm wondering if there could be something about my gff file that is tripping up TPMcalculator?

Here's my gff file for the example case above

$ grep DTO2_FUN_009269 ../../final_assemblies/full_assemblies/DTO2/genome.masked.gff3 
scaffold_6	funannotate	gene	3072674	3073285	.	-	.	ID=DTO2_FUN_009269;
scaffold_6	funannotate	mRNA	3072674	3073285	.	-	.	ID=DTO2_FUN_009269-T1;Parent=DTO2_FUN_009269;product=hypothetical protein;
scaffold_6	funannotate	exon	3072674	3073285	.	-	.	ID=DTO2_FUN_009269-T1.exon1;Parent=DTO2_FUN_009269-T1;
scaffold_6	funannotate	CDS	3072674	3073285	.	-	0	ID=DTO2_FUN_009269-T1.cds;Parent=DTO2_FUN_009269-T1;

DaRinker avatar Mar 30 '23 22:03 DaRinker

I may have figured out what is going on here.

These discrepancies appear to be instances of overlapping genes in the gff (opposite strands) where TPMcalculator can't assign one transcript as the correct strand so there are no unique reads assigned to EITHER transcript.

Not sure if there's a way to fix this within the TPMcalculator? (I don't seen an option to enforce stranded-ness)

DaRinker avatar Mar 31 '23 14:03 DaRinker

See output description here: https://github.com/ncbi/TPMCalculator/wiki/Output-files

r78v10a07 avatar Feb 06 '24 19:02 r78v10a07