Applying Shark to nanopore reads
Hello,
Thank you very much for the Shark tool which is very practical to us when we work with exome sequencing data (see this article where we used it for typing).
We are now working with Nanopore data, so with much longer reads (length is between 1,000 and 10,000 basepairs) and we were wondering how we should adapt the parameters for maintaining a high recall on those reads. Would you have any tip by chance?
Sorry for the late reply, and thanks for your feedback: I'm glad to hear that Shark has been useful in your project. Shark was designed with the Illumina short read error model in mind but, anyway, I did a quick test with some simulated ONT reads to check if it is suitable for them too. As expected from a theoretical standpoint, I got the best recall with a small k-mer size (like 15 or 17), low confidence (around 0.1), and no filtering on base quality (-q 0). Please consider these results as a tip, as I performed just a very quick exploratory check and it shouldn't be considered a proper validation of the tool on long reads. I hope this helps!
Thank you for your answer, those indication are already helping. I will try it out with your recommendations in mind.