funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

training parameters

Open mictadlo opened this issue 2 years ago • 1 comments

Hi,

  1. I have Illummin R1 and R2 of RNA-Seq. Do I need to provide --stranded? If yes, how can I find it out?
  2. If I ran Trinity on the cluster to speed up the runtime. Would it be enough to provide the result by --trinity parameter only or do I still have to provide -l and -r parameters?
  3. Is Is possible to use the output from Mikado as the --trinity parameter? This tool apparently shows

..that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics while solving common artifacts such as erroneous transcript chimerisms.

Thank you in advance,

Best wishes,

Michal

mictadlo avatar Nov 09 '21 04:11 mictadlo

  1. Stranded would depend on whether it was a strand-specific library prep or not. You can guess whether it was strand-specific with some analyses like rseqc - see infer_experiment.py
  • Also see https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype but this I think needs an existing transcriptome so I don't think it will solve your problem
  1. currently you need to still provide the raw fastq even if you provide an existing trinity transcript set because select of genes to use in training is further conditioned by expression level/coverage as well
  2. I think if it is just assembled transcripts then seems like yes you can just provide that assembly instead of trinity assembly

hyphaltip avatar Nov 09 '21 16:11 hyphaltip