transcripts with identical coordinates but different abundances
Hello,
Thanks so much for developing this tool, I find it quite easy to use.
I'm wondering what's causing duplicated transcripts (identical coordinates, different transcript id) to have different quantifications within the same sample? I see that the duplicated transcript is coming from using a reference gtf (human, GENCODE v37) that contains two transcripts with different transcript_id but the same genomic coordinate (chr1:154582076-154585043:154582076-154608204:154585217-154585865:154586181-154586363:154588125-154588258:154588551-154588673:154589369-154589462:154589757-154590409:154596805-154596995:154597123-154597267:154597828-154597976:154598402-154598585:154601041-154602626:154607992-154608204:-). Does this mean the reference gtf should be pre-filtered to contain only transcripts with unique genomic coordinates?
Looking at the coverage for the region chr1:154582076-154608204 in IGV seemed to suggest that the higher quantification (1st screenshot) is correct:
The command line call was:
stringtie \
-G ref_annot.gtf \
--ref ref_genome.fa \
-o STRG_transcripts.${sampleID}.gtf \
-A STRG_gene_abundances.${sampleID}.tab
-B --rf -t -c 1 -f 0.01 -M 0.95 -p 50 -v ${bamdir}/${sampleID}.sortedByCoord.out.bam
Any advice on this is greatly appreciated!
Best regards, Jenny