stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

long read + RNAseq

Open jdmontenegro opened this issue 5 years ago • 9 comments

Hi all, I would like to use both long read (isoseq) data and paired-end RNA-seq (Illumina) data to produce gene assemblies. what would be the recommended way to do this? I am thinking of doing something like this:

  1. Align isoseq with minimap2
  2. Run stringTie with the -L option to get initial gene models
  3. Index the genome with STAR using the primary LongRead GTF file from the previous step
  4. map the RNAseq reads with STAR 2-pass mode with output compatible with Cufflinks/StringTie
  5. Run Stringtie with the new RNAseq alignments and the primary GTF as guide.

Would the new GTF be more complete/reliable than the Isoseq GTF alone?

I look forward to hearing back from you with any suggestions/opinions.

Regards,

Juan D.

jdmontenegro avatar Nov 06 '20 14:11 jdmontenegro

I have the same question

bbhatt1789 avatar Jun 17 '21 15:06 bbhatt1789

Very well timed bump of this topic! The current release of StringTie has a new --mix option which is a new feature allowing StringTie to process both BAM files at once (short read alignments and long read alignments). So both BAM files can be given as input with that option, while the -G option should simply point to a reference annotation as "guides" for the assembly of such mixed alignment data.

Please give the --mix option a try and let us know how it works for you.

gpertea avatar Jun 17 '21 15:06 gpertea

Thank you so much for a quick response. I will try that. Thanks.

bbhatt1789 avatar Jun 17 '21 15:06 bbhatt1789

Sorry it may seem like a trivial question but what exactly do these options -L and --mix do and how are they different ? Sorry

bbhatt1789 avatar Jun 17 '21 15:06 bbhatt1789

They both allow using long read alignments as input for transcript assembly, but -L allows only long read alignments to be assembled (a single type of input - just long read alignments), while the --mix option allows both short reads alignments (e.g. Illumina reads aligned with HISAT2 or STAR) and long read alignments (PacBio or Nanopore RNA-Seq reads aligned with minimap2) to be assembled at the same time.

Before --mix option was added, StringTie could only assemble either short read alignments (default mode of operation), or just long read alignments input if -L option was used.

-L option is not needed when --mix is specified, but --mix expects 2 input alignment files in a specific order (short reads first, long reads second). Example command line:

stringtie -G annotation.gtf --mix short_alns.bam long_alns.bam -o mix_assembly.gtf

gpertea avatar Jun 17 '21 16:06 gpertea

okay thank you. Sorry I have more to ask, kind of related to previous. From what I understand, stringtie2 corrects alignment for splice sites and implements a pruning algorithm for long reads. So I guess I was trying to ask if these -L and --mix flags are implemented in this context.

Also I have other question. If you don't provide reference genome what doe it use by default? I see it works without that. Not sure if that is the correct way.

Thank you and sorry to ask too many questions.

bbhatt1789 avatar Jun 17 '21 18:06 bbhatt1789

Oh I did not realize you were asking about the underlying implementation. Yes, in both cases long read alignments are processed that way. Obviously the presence of short read alignments (i.e. in case of --mix) makes the splice site correction more accurate.

As for your second question, I am unclear about what you meant by not providing the "reference genome"? The reads were aligned to a reference genome, of course.. But perhaps you meant without reference annotation (-G) ? StringTie uses the junctions and "exons" as discovered from the alignment data and proposes transcript structures solely based on that, though -G can help with "guiding" the assembly process somewhat and to tag the output transcript assemblies that match the reference transcripts etc.. Not sure I understood the meaning of the question.

gpertea avatar Jun 17 '21 20:06 gpertea

yes I meant reference annotation. Sorry. okay thank you.

bbhatt1789 avatar Jun 17 '21 22:06 bbhatt1789

Hi @bbhatt1789, I'm in the same boat as you right now with long- and short-read data. I was wondering what you used downstream of stringtie? I'm having a really hard time deciding, and could use the advice of someone who's done this before. Thanks!

majdabdul avatar Jul 20 '23 10:07 majdabdul