John Lees
John Lees
For duplication checks, it would be useful to keep a hash value of each sequence in the database, which should be easy as we read through all the sequence anyway.
Used in random matches, but can now use the standalone dust library. See https://github.com/mrc-ide/dust/pull/333 and https://mrc-ide.github.io/dust/articles/rng.html#reusing-the-random-random-number-generator-in-other-projects-1
Looks like there's a nice solution to memory mapping in eigen here: https://stackoverflow.com/a/51256597 _Originally posted by @johnlees in https://github.com/johnlees/pp-sketchlib/issues/53#issuecomment-773368230_
Some form of serialisation of databases, and/or JSON representation, would be useful for web interfaces
Useful for repeated queries, as otherwise they would have to be loaded from HDF5 each time
See lines 75-78 of `sketch.cu`. Just need to get a valid first hash in the read
Using xoshiro128+ somewhat manually, but can now use the standalone dust library. See mrc-ide/dust#333 and https://mrc-ide.github.io/dust/articles/rng.html#reusing-the-random-random-number-generator-in-other-projects-1
This is a bit of a pain because to do this in python we need to use `process_map`, but this requires all arguments to the mapped function can be pickled....
rfile/qfile should probably be checked for: `,`, `;`, `(`, `)` or whitespace, which will all cause issues