TPMCalculator
TPMCalculator copied to clipboard
TPMCalculater, the output files are empty
My bam file generated from Star, when I used TPMCalculater, the output files are empty. It complains "Chromosome with name: TraesCS4D01G266700.2 does not exist Chromosome" for all the queries. I am not sure what's wrong. Thank you in advance!
btw, I also attached image file for bam, gtf and genome fasta files:



Hi Zoe-github2!
Based on your BAM file in the third figure, it seems that the reads were aligned to the transcriptome, and not to the genome. To my knowledge, TPMcalculator works with genome-aligned RNA-seq.
If the input alignments would be from the genome aligned BAM instead of the transcriptome, then it should be able to quantify the expression.
Best wishes, Lorinc
Thank you Lorinc for the prompt response.
Actually, we aligned read against the genome not transcriptome. We used star version 2.5.3a with following commands:
- First generate index: STAR --runThreadN ${num_thread} --runMode genomeGenerate --genomeDir ${ref_index_dir} --genomeFastaFiles ${ref_fasta_dir} --sjdbGTFfile ${gtf_dir} --sjdbOverhang 100 --limitGenomeGenerateRAM 100000000000 --genomeSAindexNbases ${myGenomeSAindexNbases}
- Alignment: CommonPars="--sysShell /bin/bash --runMode alignReads --runThreadN 32 --limitBAMsortRAM 100000000000 --limitIObufferSize 500000000 --limitSjdbInsertNsj 5000000 --outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --outSAMmode Full --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --chimSegmentMin 20 --quantMode TranscriptomeSAM GeneCounts --outBAMsortingThreadN 0 --outSAMattributes All --outWigType None" $STAR $CommonPars --genomeDir ${ ref_index_dir } --readFilesIn ${reads1} {reads2} --outFileNamePrefix ${output_dir}
Thanks, Zoe
On Fri, Feb 12, 2021 at 12:58 PM pongorlorinc [email protected] wrote:
Hi Zoe-github2!
Based on your BAM file in the third figure, it seems that the reads were aligned to the transcriptome, and not to the genome. To my knowledge, TPMcalculator works with genome-aligned RNA-seq.
If the input alignments would be from the genome aligned BAM instead of the transcriptome, then it should be able to quantify the expression.
Best wishes, Lorinc
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/TPMCalculator/issues/67#issuecomment-778348713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS2Q36JYEVSZR4JQVR3ZPX3S6VT6HANCNFSM4XRAYGTQ .
Thanks for the update. I see you used the TranscriptomeSAM option which also generates the transcriptome based BAM file.
Did you run TPMcalculator with the transcriptome or genome aligned BAM? To my understanding, there should be 2 separate BAM files based on your STAR command.
Best wishes, Lorinc
Yes, there are two bam files generated: Aligned.toTranscriptome.out.bam and Aligned.sortedByCoord.out.bam. I tried using "Aligned.sortedByCoord.out.bam", also get the messages: ... ... Chromosome with name: TraesCS3A01G085400.2 does not exist Chromosome with name: TraesCS3A01G370000.2 does not exist Chromosome with name: TraesCS3A01G370000.1 does not exist 755317 reads processed in 17.8853 seconds Printing results Total time: 379.417 seconds
With three empty files.
Thanks, Zoe
On Fri, Feb 12, 2021 at 2:28 PM pongorlorinc [email protected] wrote:
Thanks for the update. I see you used the TranscriptomeSAM option which also generates the transcriptome based BAM file.
Did you run TPMcalculator with the transcriptome or genome aligned BAM? To my understanding, there should be 2 separate BAM files based on your STAR command.
Best wishes, Lorinc
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/TPMCalculator/issues/67#issuecomment-778404103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS2Q36NH3YXTWWJJJBJSJXTS6V6MJANCNFSM4XRAYGTQ .
Hi Zoe,
A small followup question. Are the missing chromosomes part of the genome fasta file?
If you run a grep ">" genome.fasta
It should output all fasta entries (where genome.fasta is the path to the fasta file).
Best wishes, Lorinc
Hi Zoe, As Lorinc commented, the BAM file you showed here uses different reference name than your FASTA and GTF, see third column in the BAM file which starts with Tra....
TPMCalculator build a gene model from the GTF using the first column of the GTF as the reference name (chromosome ins this case). Then, the tool assign the reads using the reference name (third column in the BAM file) to the right position on the chromosome.
You need to be sure the BAM files uses as reference the same names that are in the GTF otherwise TPMCalculator won't be able to process the reads.
You can run these commands to see the references on each file:
GTF:
awk '{print $1}' combined.gtf | sort -u
Fasta:
grep "^>" genome.fasta | awk '{print $1}' | cut -c 2-
BAM:
samtools view bamfile.bam | awk '{print $3}' | sort -u
Please, let us know if you need more help.
Hello,
I think I'm having similar issue. What exactly the expected output of TPMCalculator is? Could you specify number out output files and formats? Or do I have to specify output, like output.txt?
Here's my command, I tried both RefSeq gtf file and UCSC gtf file.
TPMCalculator -g ${hg19genome} -d ${dir}/${Sample} -b ${BAM} -c ${ReadLength}
Regards, Sumin
Hello,
I think I'm having similar issue too.
Chromosome with name: xxx does not exist Here's my command.
TPMCalculator -g xx.gtf -d ${dir/xx.sorted.bam} Regards, zhaoshan
Please, send sample of the GTF and BAM as the original reporter did.
hello,
I am having the same issue: Chromosome with name: xxx does not exist
here is my sorted.bam file and gtf file
SRR8427257.gtf.zip
I do the following things and check the reference name of gtf file and bam file and they are the same