Mash icon indicating copy to clipboard operation
Mash copied to clipboard

Update RefSeq?

Open asaldivar93 opened this issue 2 years ago • 2 comments

Hi, I'm using Mash to detect contamination in de-novo genome assemblies, together with other tools that work on the latest release of the RefSeq database. Is it possible to build a sketch file for the genomes in the latest release using a PC with 16Gb RAM?

If it is, could you share the workflow necessary to do it? If it is not, is someone willing to do the work and share the file?

Any help will be greatly appreciated

asaldivar93 avatar Jul 14 '21 09:07 asaldivar93

yes.is the refseq.genomes.k21.s1000.msh is the latest version ?

Caiyulu-818 avatar Mar 04 '22 15:03 Caiyulu-818

No, it is quite old. I would advise to create a new sketch. NCBI RefSeq now has 330,648 genome reference assemblies while the sketch has 91,282. Sometimes I hit deprecated accession numbers that are removed from new metadata assembly_file_manifest.txt

kbessonov1984 avatar Sep 19 '23 19:09 kbessonov1984