TPMCalculator icon indicating copy to clipboard operation
TPMCalculator copied to clipboard

Question about the -c fag and multi-mappers

Open agolicz opened this issue 3 years ago • 3 comments

Hello, I am not sure I understand the meaning of the -c flag "-c Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length". Why do you recommend the reads length?

Also, how does the software treat multi-mapping reads, so reads matching multiple locations across the genome (value more than 1 in NH:i: field). Is that normally handled by MAPQ filtering?

All the best, Agnieszka

agolicz avatar Dec 18 '20 20:12 agolicz

Hi,

TPMCalculator creates a gene model by overlapping the exons of all isoforms of a gene. The -c option set the minimum size for creating an intron when overlapping multiple exons. This value does not affect the quantification of the RNASEq abundance of the exons but it can change the quantification of transcripts and genes if there are intron retention present.

Multi-mapping reads are filtered using the MAPQ value as you said. However, each aligner has its proper implementation of MAPQ values so you need to check for your aligner.

This blog could be of more help regarding the MAPQ values for aligner: https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/

r78v10a07 avatar Dec 21 '20 14:12 r78v10a07

Sorry, one more question, does TPMCalculator support stranded libraries and if not is there a plan to add that feature? I thought it might have been included in v0.04. I just did fresh installations with Miniconda3 and according to the installation details following packages were installed: bamtools-2.5.1 | he513fc3_6 1.1 MB bioconda tpmcalculator-0.0.4 | h7376a40_0 1.4 MB bioconda But when I run TPMCalculator -version It still lists 0.0.3 and the options don't mention stranded reads. Usage: TPMCalculator

TPMCalculator options:

-v Print info -version Print version -h Display this usage information. -g GTF file -d Directory with the BAM files -b BAM file -k Gene key to use from GTF file. Default: gene_id -t Transcript key to use from GTF file. Default: transcript_id -c Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length -p Use only properly paired reads. Default: No. Recommended for paired-end reads. -q Minimum MAPQ value to filter out reads. Default: 0. This value depends on the aligner MAPQ value. -o Minimum overlap between a reads and a feature. Default: 8. -e Extended output. This will include transcript level TPM values. Default: No. -a Print out all features with read counts equal to zero. Default: No.

agolicz avatar Dec 22 '20 11:12 agolicz

That feature will be include in the next release.

r78v10a07 avatar Dec 22 '20 13:12 r78v10a07