Merged genes on a de-novo assembly
Hello @gpertea
I'm trying to make a de-novo transcriptome assembly. I used minimap2 to align long and oriented full-length reads with the following command:
minimap2 -ax splice -uf -t ${task.cpus} -G 2000000 ${index} ${fastq_file}
I used -G as 2Mb because the species has huge introns, and that value was suggested in literature. I filtered for only primary alignments with good quality and then I ran Stringtie as follows:
stringtie ${bam_file} \\
-p ${task.cpus} \\
-o ${assembly_name}.gtf \\
-l ${params.idPrefix}${sample_info['sampleID']} \\
-L -m ${params.minReadLength} \\
-A ${assembly_name}.gene_abund.tab \\
--conservative
I did this for the reads on multiple samples (tissues), so I used StringTie --merge to build the consensus:
stringtie --merge *.stringtie.gtf \\
-o stringtie.merge.raw.gtf \\
-m ${params.minReadLength} \\
-l ${params.idPrefix}
I have an issue where two different genes are merged together, because one transcript spans across >1Mb intron.
I checked one read supporting the alignment and it wasn't a chimeric read. The alignment was supported by multiple reads, Below I show a zoom in at both ends of the transcript:
As my species has huge introns I want to keep the minimap parameters. I want to ask if there is a parameter in StringTie to set a threshold to avoid cluster the transcripts into a single gene where the distance separating their initial coordinates is very long.
Best, Salvador