datasketches-cpp icon indicating copy to clipboard operation
datasketches-cpp copied to clipboard

Determinism

Open RAMitchell opened this issue 1 year ago • 11 comments

Hi, I am interested in using this amazing project for a distributed machine learning application. My only blocker to doing this is the ability to reproduce results for a given seed. I am wondering if there is any interest in adding such a feature, or any way to use the existing library to get a deterministic result. It looks to me like it would require adding a random engine member to the sketches themselves and then modifying the appropriate serialisation code.

I also see that I could compile with the KLL_VALIDATION option to get a deterministic KLL sketch, although this is not thread safe.

Any suggestions?

RAMitchell avatar Apr 17 '23 08:04 RAMitchell