stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

Issue with Zero Coverage for Some Exons/Transcripts in StringTie Despite Aligned Reads (Short Reads RNA-seq Analysis)

Open fafaris39 opened this issue 11 months ago • 1 comments

I am using StringTie v3.0.0 to discover novel transcripts from short-read RNA-seq data. After quantification, I noticed that some transcripts/exons have a coverage (cov) of 0, even though reads are aligned to these regions according to samtools view. Pipeline Used:

1-Transcript assembly per sample: stringtie -p 8 -c 2 -j 2 "$BAM_FILE" -G $GTF_FILE -o "$OUTPUT_GTF"

2-Merging assembled transcripts: stringtie --merge $INPUT_DIR/*reconstructed.gtf -o $OUT_DIR/stringtie_merged_without_ref_output.gtf

3-Quantification of transcripts: stringtie -p 12 -e -B $BAM_FILE -G $GTF_merged -o $OUTPUT_GTF/*_quantified.gtf


└── ballgown
    ├── X15-WT
    │   ├── X15-WT_quantified.gtf
    │   ├── e2t.ctab
    │   ├── e_data.ctab
    │   ├── i2t.ctab
    │   ├── i_data.ctab
    │   └── t_data.ctab
    ├── X2-WT
    │   ├── X2-WT_quantified.gtf
    │   ├── e2t.ctab
    │   ├── e_data.ctab
    │   ├── i2t.ctab
    │   ├── i_data.ctab
    │   └── t_data.ctab
    ├── X3-PM
    │   ├── X3-PM_quantified.gtf
    │   ├── e2t.ctab
    │   ├── e_data.ctab
    │   ├── i2t.ctab
    │   ├── i_data.ctab
    │   └── t_data.ctab

Issue Observed:

For example, the transcript MSTRG.7.2 has zero coverage in t_data.ctab and _quantified.gtf despite read alignments. Results from grep in the .gtf file:

grep MSTRG.7.2 X2-WT_quantified.gtf

1	StringTie	transcript	4561613	4567577	.	-	.	gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
1	StringTie	exon	4561613	4562891	.	-	.	gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "1"; cov "0.0";
1	StringTie	exon	4563995	4564086	.	-	.	gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "2"; cov "0.0";
1	StringTie	exon	4565359	4566165	.	-	.	gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "3"; cov "0.0";
1	StringTie	exon	4566514	4567577	.	-	.	gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "4"; cov "0.0";

Check in t_data.ctab:


grep MSTRG.7.2 t_data.ctab

t_id	chr	strand	start	end	t_name	num_exons	length	gene_id	gene_name	cov	FPKM
9	1	-	4561613	4567577	MSTRG.7.2	4	3242	MSTRG.7	.	0.000000	0.000000

Check in e_data.ctab (exon coverage):

e_id	chr	strand	start	end	rcount	ucount	mrcount	cov	cov_sd	mcov	mcov_sd
46	1	-	4561613	4562891	56	56	56.00	2.6654	1.8036	2.6654	1.8036
43	1	-	4563995	4564086	1	1	1.00	2.8043	1.5965	2.8043	1.5965
44	1	-	4565359	4566165	18	18	18.00	1.4597	1.3443	1.4597	1.3443
45	1	-	4566514	4567577	6	6	6.00	0.3327	0.5931	0.3327	0.5931

Read counts verification with samtools:

samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4561613-4562891
58
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4563995-4564086
6
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4565359-4566165
21
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4566514-4567577
6

We can see that reads are aligned to these exons, but cov remains 0 in t_data.ctab and _quantified.gtf. Questions: Why does StringTie assign a coverage of 0 even though reads are present in samtools view?

Image

fafaris39 avatar Jan 31 '25 14:01 fafaris39

The reads are allocated between the 5 transcript, based on read support. From the igv snapshot, I would guess that the reads are likely allocated between the 3rd and 4th transcript.

ishinder avatar Feb 14 '25 19:02 ishinder