Mash icon indicating copy to clipboard operation
Mash copied to clipboard

Subset a bigger sketch for bootstrapping

Open CBorreda opened this issue 4 years ago • 0 comments

Hello! I'm trying to use Mash to assess genetic distance between several distant plant species. We have them sequenced by Illumina and I've used Mash to sketch my read files and make an alignment free distance estimation. So far, the tree looks good, but I'd like to somehow bootstrap it, and I've read about it here https://github.com/marbl/Mash/issues/111#. Now, redoing all sketches with different seeds for say 100 times would take too long, but I was wondering if there is a way to make somehow a "master" sketch (with a bigger sketch size) and then subsample from there instead of going through the whole fastqs again.

I also thought of first counting all kmers (using a different software), keep those with at least N occurrences and transform these kmers into pseudo-fastas (of length k) to input them to Mash. I'll have to test if this is faster to bootstrap than re-reading the fastq each time, but before that I'd like to be sure of whether this approach would give reliable results (considering that I'm artifically creating reads with the size of a kmer)

CBorreda avatar Feb 06 '20 19:02 CBorreda