stringtie
stringtie copied to clipboard
Multiple transcripts with the same coordinates
Hi,
I'm trying to run StringTie for transcriptome assembly using BAM files generated by 'STAR' The command line that I used is followed
stringtie sample1.sorted.bam -f 0.1 -c 2.5 -p 15 -G ref.chr.gtf -o sample1.gtf &> sample1.log
And I found that there are some transcripts with the same coordinates and slightly different combinations of exons. Here are some examples that I found
9 StringTie transcript 51918953 51931337 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.1"; cov "9.697406"; FPKM "1.410810"; TPM "3.128901";
9 StringTie exon 51918953 51919060 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "1"; cov "6.178105";
9 StringTie exon 51927055 51927125 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "2"; cov "13.987437";
9 StringTie exon 51928061 51928135 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "3"; cov "15.125772";
9 StringTie exon 51931154 51931337 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "4"; cov "7.895041";
9 StringTie transcript 51918953 51931337 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.2"; cov "8.106643"; FPKM "1.179381"; TPM "2.615635";
9 StringTie exon 51918953 51919060 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "1"; cov "5.100532";
9 StringTie exon 51927055 51927125 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "2"; cov "11.547774";
9 StringTie exon 51928061 51928135 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "3"; cov "12.487562";
9 StringTie exon 51931136 51931337 1000 + . gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "4"; cov "6.877785";
I have some questions about these transcripts.
- Could you explain why StringTie does not merge these transcripts and report them as a single transcript?
- I found that there are some exons have the same coordinates but have different coverage values for each transcript (Ex: STRG.17234.1's exon number 1 and STRG.17234.2's exon number 1). Why do these exons have different coverage values?
- Is there any method for handling these transcripts for the following analysis using this GTF file?
Thank you!