Daniel Baker

Results 122 comments of Daniel Baker

Hi - I've found the issue. In the last line for Dashing2, you need to specify the k-mer length and sketch size again. Dashing2 was defaulting to 31, while Dashing1...

It's in the number of registers, not bytes. Each register is 4 bytes. Soon I might add [KMG] notation so that this can be done with fewer characters.

Found the problem - missing "L" in the getopt string. I've prepared its fix with a CLI sanity check (making sure there are no unexpected flags)

Hi Jianshu, You should be able to use the same equation converting k-mer similarity fraction to ANI and for AAI, substituting the relevant statistics. Specifically: ``` 1 + log(2*J/(1+J)) /...

Hi Jianshu, Sorry for the wait! It's been a busy couple of weeks. I've added in cmp usage (https://github.com/dnbaker/dashing2/commit/3b71c9cdb925aa582921e89a7cc66f62f773f9d4), so thank you for pointing that out. You can pass sketched...

Hi Jianshu, You're right - we can probably support all targets using simde. I'll get to work on it over the next few days, since it would be useful for...

Hi Jianshu, Thanks for the issue! I think that what you're running into is out-of-memory errors when computing the k-mer count map before building the ProbMinHash sketch. You can add...

Hi Jianshu, Thanks for your question! Yes -- you can apply this to amino sequence collections! You enable this with the flag `--protein`, which uses a 20-letter amino acid alphabet....

The SetSketch isn't supporting multiplicities, but we have two ways to deal with them. If you add the `--probminhash` flag, you will sketch with ProbMinHash rather than the SetSketch, generating...

I'd lean toward using BagMinHash, then. On the genome pairs I've compared with fastANI, it's been very slightly better at estimating ANI than ProbMinHash, but sometimes better than exact k-mer...