stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

Multiple transcripts with the same coordinates

Open suyeonwy opened this issue 3 years ago • 0 comments

Hi,

I'm trying to run StringTie for transcriptome assembly using BAM files generated by 'STAR' The command line that I used is followed

stringtie sample1.sorted.bam -f 0.1 -c 2.5 -p 15 -G ref.chr.gtf -o sample1.gtf &> sample1.log

And I found that there are some transcripts with the same coordinates and slightly different combinations of exons. Here are some examples that I found

9   StringTie   transcript  51918953    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; cov "9.697406"; FPKM "1.410810"; TPM "3.128901";
9   StringTie   exon    51918953    51919060    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "1"; cov "6.178105";
9   StringTie   exon    51927055    51927125    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "2"; cov "13.987437";
9   StringTie   exon    51928061    51928135    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "3"; cov "15.125772";
9   StringTie   exon    51931154    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "4"; cov "7.895041";
9   StringTie   transcript  51918953    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; cov "8.106643"; FPKM "1.179381"; TPM "2.615635";
9   StringTie   exon    51918953    51919060    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "1"; cov "5.100532";
9   StringTie   exon    51927055    51927125    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "2"; cov "11.547774";
9   StringTie   exon    51928061    51928135    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "3"; cov "12.487562";
9   StringTie   exon    51931136    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "4"; cov "6.877785";

I have some questions about these transcripts.

  1. Could you explain why StringTie does not merge these transcripts and report them as a single transcript?
  2. I found that there are some exons have the same coordinates but have different coverage values for each transcript (Ex: STRG.17234.1's exon number 1 and STRG.17234.2's exon number 1). Why do these exons have different coverage values?
  3. Is there any method for handling these transcripts for the following analysis using this GTF file?

Thank you!

suyeonwy avatar Feb 18 '22 09:02 suyeonwy