datasketches-cpp
datasketches-cpp copied to clipboard
Determinism
Hi, I am interested in using this amazing project for a distributed machine learning application. My only blocker to doing this is the ability to reproduce results for a given seed. I am wondering if there is any interest in adding such a feature, or any way to use the existing library to get a deterministic result. It looks to me like it would require adding a random engine member to the sketches themselves and then modifying the appropriate serialisation code.
I also see that I could compile with the KLL_VALIDATION option to get a deterministic KLL sketch, although this is not thread safe.
Any suggestions?