medaka icon indicating copy to clipboard operation
medaka copied to clipboard

Choosing the right model

Open pbelmann opened this issue 1 year ago • 3 comments

Hi,

thank you for developing this tool! I would like to use medaka for polishing a dataset where I only have information about the device. Would it still make sense to use medeka with a model randomly chosen just based on the device used?

pbelmann avatar Jul 14 '22 08:07 pbelmann

Hi @pbelmann,

From where has your data come? It might be the case that the fastq headers contain useful information that can help us advise you. Failing that, if you know an approximate time perioid when sequencing/basecalling was performed we can take a best guess assuming the most recent Guppy version was used at the time.

cjw85 avatar Jul 14 '22 08:07 cjw85

Hi @cjw85,

thank you for your fast reply!

From where has your data come?

The idea is actually to process datasets that are available on SRA. So my question is not bound to a specific dataset but is a more general question regarding datasets where I do not have the information needed to run medaka. In most cases I really just get the used "instrument" from the SRA metadata. Example Dataset: http://ftp.era.ebi.ac.uk/vol1/fastq/ERR499/008/ERR4994318/ERR4994318.fastq.gz

It might be the case that the fastq headers contain useful information that can help us advise you.

Do you mean that the device or the basecaller version is encoded in the header?

Failing that, if you know an approximate time perioid when sequencing/basecalling was performed we can take a best guess assuming the most recent Guppy version was used at the time.

Based on the SRA 'release date' attribute I could indeed get the rough time period. So you would suggest to check which guppy version was the most recent one? Where would I get this information?

pbelmann avatar Jul 18 '22 11:07 pbelmann

The basecaller version and model is encoded in the fastq headers for more recent data, older datasets will not have this.

The guppy CHANGELOG contains release dates. I don't know if this is bundled in the Guppy distribution. Its available in the Nanopore Community (sorry, needs a sign in) https://community.nanoporetech.com/downloads/guppy/release_notes

cjw85 avatar Aug 12 '22 09:08 cjw85