Mash
Mash copied to clipboard
Fast genome and metagenome distance estimation using MinHash
Hi, I like Mash very much because of its speed. However, if I want to screen multiple fastq-samples for similarity with the bacterial[ refseq-genomes](http://mash.readthedocs.io/en/latest/tutorials.html#screening-a-read-set-for-containment-of-refseq-genomes) (+500mb), it takes about 1m for...
Parsing mash screen output (in Pandas at least) is slowed considerably by the use of decimal fractions, parsed initially as a strings, requiring time consuming string wrangling/evaluation in order to...
Hi, there are duplication genomes in triangle output. Mash version 2.2.2 $ mash triangle *.fna *.fna 6 genome1.fna genome2.fna 0.0222766 genome3.fna 0 0.0222766 genome1.fna 0 0.0222766 0 genome2.fna 0.0222766 0...
In order to compile Mash on Ubuntu 20.04 LTS I had to make the following changes: In **configure**: Line 2009 - Change the path because apt installs capnp to `/usr/bin/`...
I am new to linux. When I run mash dist, I meet a trouble as the follow mash dist refseq.genomes.k21.s1000.msh 19-40.random.fq.gz.msh > 19-40.distances.tab ERROR: could not open "refseq.genomes.k21.s1000.msh" for reading....
I have put together a patch for 2.2.2 to make it compile with x86_64-w64-mingw32-g++/gcc and target Windows. Currently using capnp 0.8.0. Patch is restricted to mmap/munmap and memcpy wrapper. Some...
Hello, I was wondering if you would recommend filtering mash distances to keep only those with a significant p-value? I read in your paper that a high p-value could mean...
I use Mash as one of the dependencies of PanACoTA. I got the following error while sketching: `error: mash sketch -o Acetobacter_orleanensis/mash_files/all-genomes-Acetobacter_orleanensis -p 1 -l Acetobacter_orleanensis/mash_files/list-to-sketch-Acetobacter_orleanensis.txt -s 1e4 does not...
I'm using Mash 2.0 and the file refseq.genomes.k21.s1000.msh. When I run "mash dist" on an individual FASTA file and query it against refseq.genomes.k21.s1000.msh, I noticed that the "query-comment" does not...
Hi, I have a database uhgp-100.faa, this is a database of human gut genome from (nature biotechnology 2020), it is a fasta file, which have 170,000,000 proteins sequences。(68G) I also...