tdigest
tdigest copied to clipboard
[Suggestion] Small exports for distributed programs
Hi,
This is a fork used in large distributed programs where I work. It adds a Distributable
class that inherits from Digest
. The purpose of that class is to minimize the size of the exported state (toArray
) so that a node wanting to read a percentile value can fetch lots of small internal states from each node and recompute the percentile quickly.
It implements toList()
, which is a more compact version of toArray()
. It uses arrays to save space on the countless mean: ..., n: ...
. The centroids can be pushed back into a new Distributable instance using .push(centroid[0], centroid[1])
.
I have no idea if this would be useful to you or anyone else, but I'm opening this PR in case you find it interesting and/or want to merge it.
Thanks! I'll give it a look later today.
To be honest, I think it's a very narrow use case, the settings are hardcoded and there's no tests. I would be surprised if you merged is as is. I opened the PR because if I do work on top of open source software I like to show the author how it's being used in case it gives them ideas :)
As you say, it's not mergeable code (I'd hit you up for unit tests at least). But it's a pretty classy way to submit a feature request :smile:
I'll take this on so I can get you back to running the main line.
Thanks!