dashing icon indicating copy to clipboard operation
dashing copied to clipboard

unique exact matches hll

Open lutfia95 opened this issue 4 years ago • 1 comments

Hi,...

i have a question about the unique exact matches, can I use the (./dashing hll) not for exact matches, I need to know the whole number of matches, not just the unique one. In my sensitivity is it important to check the whole number of matches, only the unique exact matches is not really useful in my experiment.

Thanks!

Cheers Ahmad

lutfia95 avatar Oct 26 '20 21:10 lutfia95

Hi Ahmad,

I'm happy to help, but I'm not quite sure exactly what you're looking for.

Are you looking for multiset similarity, where multiple instances of the same k-mer are counted multiple times? You can do this exactly with dashing <dist/sketch> --wj-exact [input files] or inexactly, using a count-mi sketch, via dashing <cmd> --wj . See the Streaming Weighted Jaccard portion of the usage.

On the other hand, you might be looking for exact k-mer counts/matches; in that case, you can replace the HLL with sorted hash sets via the --use-full-khash-sets option.

Thanks for asking, and I'm happy to help further.

Best,

Daniel

dnbaker avatar Oct 27 '20 21:10 dnbaker