flappie icon indicating copy to clipboard operation
flappie copied to clipboard

right model for Flappie & Guppy documentation

Open dcopetti opened this issue 5 years ago • 0 comments

Hello,

It is the first time I am doing the basecalling on raw ONT data and I would like to run Flappie on the fast5 data.

I have a few questions: The data (WGS on native plant DNA) was produced in October 2018 on a Promethion, basecalled one first time with guppy 1.6 (by the sequencing facility): I wonder if by using Flappie (on a cluster I have 1.1.0-73d731a) I will get a significant improvement on the quality of the base calls (attached is the report: there are many reads with quality below 7 and I would like to improve that).

Regarding the model, I see that r941_5mC outputs methylated bases as "Z" and it seems to be the only one available:

$ flappie --model help
r941_native : R9.4.1 model for MinION.  Trained from native DNA library  (default)
  r941_5mC : R9.4.1 model for PromethION; 5mC aware.  Trained from native NA12878 library
  r10c_pcr : R10C model for MinION.  Trained from PCR'd DNA library

or is it OK to use one for MinION?

What would be an expected output file size (including intermediate files) be for a e.g. 200 kb fast5? The sys admin of the cluster informed me that the parallel option is not working. Is it better to run each folder containing the fast5 on a separate job? It looks like in Flappie I can’t specify threads. How much memory (approx) will I need for each job? Would it be possible to select to keep only reads above QV7? Or should I do it outside? Which tool would you suggest?

Though Flappie seems to work (except for the Z bases), the alternative I have is Guppy (Version 2.3.1+1b9405b): is there a place where I can find good documentation on how to use it from command line? Thanks, Dario

ONT_PromethION_batch1_181026.pdf

dcopetti avatar May 26 '19 09:05 dcopetti