pp-sketchlib icon indicating copy to clipboard operation
pp-sketchlib copied to clipboard

Library of sketching functions used by PopPUNK

Results 11 pp-sketchlib issues
Sort by recently updated
recently updated
newest added

See https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html https://eigen.tuxfamily.org/dox/classEigen_1_1LDLT.html https://eigen.tuxfamily.org/dox/TopicUsingBlasLapack.html

See https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac564/6674501 Shoukd be faster

enhancement

When there gets to be >500k or so sketches in the sketch group performance gets very slow, looks like it's because the metadata cache size isn't large enough: https://forum.hdfgroup.org/t/limit-on-the-number-of-datasets-in-one-group/5892 I...

enhancement
long-term

For duplication checks, it would be useful to keep a hash value of each sequence in the database, which should be easy as we read through all the sequence anyway.

enhancement

Used in random matches, but can now use the standalone dust library. See https://github.com/mrc-ide/dust/pull/333 and https://mrc-ide.github.io/dust/articles/rng.html#reusing-the-random-random-number-generator-in-other-projects-1

enhancement
long-term

Looks like there's a nice solution to memory mapping in eigen here: https://stackoverflow.com/a/51256597 _Originally posted by @johnlees in https://github.com/johnlees/pp-sketchlib/issues/53#issuecomment-773368230_

Some form of serialisation of databases, and/or JSON representation, would be useful for web interfaces

enhancement

Useful for repeated queries, as otherwise they would have to be loaded from HDF5 each time

enhancement

See lines 75-78 of `sketch.cu`. Just need to get a valid first hash in the read

enhancement

- Sort the columns of bins (N log N), keeping track of index. - Scan through column. Where there is a block of the same value, add one to numerator...

enhancement