Got 0 counts for all of the original genes from the reference gtf after 2nd round stringtie aligning to the merged gtf file?
Dear stringtie team,
I am using stringtie to quantify RNA-Seq data for 104 samples from 7 green algae strains.
the problem is that in the first round of stringtie aligning to the reference genome gff file, I got counts for most genes.
But after the second round of stringtie, all of the original genes from the reference gtf had 0 counts and only the novel transcripts produced by stringtie (MSTRG) had any counts.
I have tried different versions of stringtie- 3.0, 2.2.1, and 1.3.6, merging only the samples from the same strain etc, but the problem persisted. Other member in my group said they solved this problem by using the older version
do you know what could be the reason and how can I resolve it?
I would like to generate an unified transcriptome reference for the 104 samples so I could do cross-strain comparison later.
I would also like to generate gene count matrix for further analysis with DESeq2.
Thank you for your time and help.
best wishes,
Ming
here is the script I used to run stringtie on linux using conda environment 2025.1.10 stringtie workflow
create stringtie environment
conda create -n stringtie stringtie=2.2.1
1st round of stringtie
#!/bin/bash
Define directories and files
input_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/umitools_SAG49.84" output_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_gtf" output_count_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_count" log_file="${output_dir}/stringtie_SAG49.84_samples_run.log" genome_file="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/data/reference_genomes/GCF_000258705.1_Coccomyxa_subellipsoidea_v2.0_genomic.gff"
Create the output directories if they don't exist
mkdir -p "$output_dir" "$output_count_dir"
Initialize the log file
echo "StringTie processing log - $(date)" > "$log_file"
Loop through all _sorted.bam files in the input directory
for dedup_bam_file in "$input_dir"/*sorted_dedup.bam; do # Extract the base filename without extension base_name=$(basename "$dedup_bam_file" .bam) # Define paths for the output files stringtie_gtf="${output_dir}/${base_name}.gtf" stringtie_count="${output_count_dir}/${base_name}.tsv"
# Run StringTie for each BAM file
{
echo "Processing $bam_file"
stringtie "$dedup_bam_file" -G "$genome_file" -o "$stringtie_gtf" -A "$stringtie_count"
echo "StringTie outputs created for $dedup_bam_file"
} >> "$log_file" 2>&1
done
STRINGTIE MERGE
find "$(pwd)" -name "*.gtf" > gtf_list.txt stringtie --merge -G /data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/data/reference_genomes/GCF_000258705.1_Coccomyxa_subellipsoidea_v2.0_genomic.gff -o merged_output_file.gtf gtf_list.txt
2nd round of stringtie
#!/bin/bash
Define directories and files
input_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/umitools_SAG49.84" output_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_2nd_round_gtf" output_count_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_2nd_round_abundance" log_file="${output_dir}/stringtie_2nd_round_SAG49.84_run.log" merged_gtf="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_gtf/merged_output_file.gtf"
Create the output directories if they don't exist
mkdir -p "$output_dir" "$output_count_dir"
Initialize the log file
echo "StringTie processing log - $(date)" > "$log_file"
Loop through all _sorted.bam files in the input directory
for input_bam in "$input_dir"/*sorted_dedup.bam; do # Extract the base filename without extension base_name=$(basename "$input_bam" .bam) # Define paths for the output files gtf_output="${output_dir}/${base_name}_2nd_stringtie.gtf" abundance_output="${output_count_dir}/${base_name}_2nd_stringtie_abundance.tab" # Run StringTie for each BAM file { echo "Processing $input_bam" stringtie "$input_bam" -e -G "$merged_gtf" -o "$gtf_output" -A "$abundance_output" echo "StringTie 2nd round gtf and transcript abundance created for $base_name using merged_gtf_output as reference" } >> "$log_file" 2>&1 done