Consider removal of t-digest computation
Right now the t-digest computation (done using a python t-digest implementation) takes most of the time in generating a summary. The initial motivation to include it was for it to contain an approximation of the histogram information, but we are also computing a fixed-bin-width histogram so it is of limited value. The t-digest information is used in the explorer for:
- arbitrary bin width histograms.
- building percentile functions in
lens.Summarythat get used to plot a CDF inlens.Explorer.
We have to consider whether these two features are important enough and whether we can use other approaches to substitute this information.
Having an adaptively binned histogram (through, e.g., bayesian blocks) would go a long way to replacing the t-digest for our exploration needs.
A significant advantage of a t-digest is that it can be updated in a chunked manner, but we are not currently using that.