lens icon indicating copy to clipboard operation
lens copied to clipboard

Consider removal of t-digest computation

Open zblz opened this issue 8 years ago • 0 comments

Right now the t-digest computation (done using a python t-digest implementation) takes most of the time in generating a summary. The initial motivation to include it was for it to contain an approximation of the histogram information, but we are also computing a fixed-bin-width histogram so it is of limited value. The t-digest information is used in the explorer for:

  • arbitrary bin width histograms.
  • building percentile functions in lens.Summary that get used to plot a CDF in lens.Explorer.

We have to consider whether these two features are important enough and whether we can use other approaches to substitute this information.

Having an adaptively binned histogram (through, e.g., bayesian blocks) would go a long way to replacing the t-digest for our exploration needs.

A significant advantage of a t-digest is that it can be updated in a chunked manner, but we are not currently using that.

zblz avatar Aug 15 '17 14:08 zblz