Future of ONT polishing ?
Hi,
I have a pipeline for medaka polishing of ONT genomes. I was considering upgrading Medaka to the latest version, but now I see dorado has come out with a dorado polish module.
Which tool is the future of ONT polishing? Medaka or Dorado ? Please don't say both. :-)
I understand dorado polishing is bleeding edge but would like to implement something from summer 2025.
Thanks, Colin
The situation is similar to bonito and Dorado.
Medaka was originally a research platform. I'd estimate 80 percent of its code is not strictly necessary for its typical operation. It does some things in overly pedantic ways. Indeed you can reimplement the inference program in it's basic form in a couple of hundred lines.
That it lasted so long as a stand-in production tool is testament to the programming philosophy, testing, packaging, and dog-fooding that went into it's development. It's time though to have a more formally supported production piece of software.
So medaka will continue to exist in its current form as long as it's reasonable to maintain it as such. Dorado will supplant medaka in time as the recommended tool for routine use.
Hi Colin,
To add a few more specifics to Chris's answer, this alpha release of Dorado polish only supports a few experimental models aimed primarily at human genome polishing. If you are working with bacterial species in particular, we do still recommend updating to medaka v2.0 for now to obtain the most accurate results.
In the short to medium term, releases of Dorado polish and medaka will both support a similar suite of models for most use cases, although Dorado polish may offer advantages such as increased speed. Finally, as Chris mentioned, in the long term we anticipate that Dorado polish will be the more stable and supported tool.
Excellent, thanks for the clear answers. I look forward to increased speed on the polishing since Medaka can be prohibitively slow for genomes over 4 GB in size.
If my data was basecalled with Dorado version 0.9.1 with super accurate (sup) model and Flowcell: FLO-MIN114 (R10.4.1) was used so what code should I use to run medaka. or I can no longer use it in 2025 ? my genome is 40mb and its a fungus.