stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

Got 0 counts for all of the original genes from the reference gtf after 2nd round stringtie aligning to the merged gtf file?

Open mt4614 opened this issue 11 months ago • 0 comments

Dear stringtie team, I am using stringtie to quantify RNA-Seq data for 104 samples from 7 green algae strains.
the problem is that in the first round of stringtie aligning to the reference genome gff file, I got counts for most genes. But after the second round of stringtie, all of the original genes from the reference gtf had 0 counts and only the novel transcripts produced by stringtie (MSTRG) had any counts. I have tried different versions of stringtie- 3.0, 2.2.1, and 1.3.6, merging only the samples from the same strain etc, but the problem persisted. Other member in my group said they solved this problem by using the older version do you know what could be the reason and how can I resolve it? I would like to generate an unified transcriptome reference for the 104 samples so I could do cross-strain comparison later. I would also like to generate gene count matrix for further analysis with DESeq2. Thank you for your time and help. best wishes, Ming

here is the script I used to run stringtie on linux using conda environment 2025.1.10 stringtie workflow

create stringtie environment

conda create -n stringtie stringtie=2.2.1

1st round of stringtie

#!/bin/bash

Define directories and files

input_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/umitools_SAG49.84" output_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_gtf" output_count_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_count" log_file="${output_dir}/stringtie_SAG49.84_samples_run.log" genome_file="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/data/reference_genomes/GCF_000258705.1_Coccomyxa_subellipsoidea_v2.0_genomic.gff"

Create the output directories if they don't exist

mkdir -p "$output_dir" "$output_count_dir"

Initialize the log file

echo "StringTie processing log - $(date)" > "$log_file"

Loop through all _sorted.bam files in the input directory

for dedup_bam_file in "$input_dir"/*sorted_dedup.bam; do # Extract the base filename without extension base_name=$(basename "$dedup_bam_file" .bam) # Define paths for the output files stringtie_gtf="${output_dir}/${base_name}.gtf" stringtie_count="${output_count_dir}/${base_name}.tsv"

# Run StringTie for each BAM file
{
    echo "Processing $bam_file"
    stringtie "$dedup_bam_file" -G "$genome_file" -o "$stringtie_gtf" -A "$stringtie_count"
    echo "StringTie outputs created for $dedup_bam_file"
} >> "$log_file" 2>&1

done

STRINGTIE MERGE

find "$(pwd)" -name "*.gtf" > gtf_list.txt stringtie --merge -G /data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/data/reference_genomes/GCF_000258705.1_Coccomyxa_subellipsoidea_v2.0_genomic.gff -o merged_output_file.gtf gtf_list.txt

2nd round of stringtie

#!/bin/bash

Define directories and files

input_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/umitools_SAG49.84" output_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_2nd_round_gtf" output_count_dir="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_2nd_round_abundance" log_file="${output_dir}/stringtie_2nd_round_SAG49.84_run.log" merged_gtf="/data/scratch/mingtoh/projects/coccomyxa_RNAseq_temp/tmp/stringtie_ver2.2.1_104_samples_1st_round_gtf/merged_output_file.gtf"

Create the output directories if they don't exist

mkdir -p "$output_dir" "$output_count_dir"

Initialize the log file

echo "StringTie processing log - $(date)" > "$log_file"

Loop through all _sorted.bam files in the input directory

for input_bam in "$input_dir"/*sorted_dedup.bam; do # Extract the base filename without extension base_name=$(basename "$input_bam" .bam) # Define paths for the output files gtf_output="${output_dir}/${base_name}_2nd_stringtie.gtf" abundance_output="${output_count_dir}/${base_name}_2nd_stringtie_abundance.tab" # Run StringTie for each BAM file { echo "Processing $input_bam" stringtie "$input_bam" -e -G "$merged_gtf" -o "$gtf_output" -A "$abundance_output" echo "StringTie 2nd round gtf and transcript abundance created for $base_name using merged_gtf_output as reference" } >> "$log_file" 2>&1 done

mt4614 avatar Jan 15 '25 17:01 mt4614