lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Use Kahan summation for float aggregations to reduce errors

Open stefanvodita opened this issue 1 year ago • 3 comments

Description

With facet associations, we have the option of summing floats into an accumulator. This type of operation is prone to errors, which can be prevented by using summation algorithms specifically designed for floats, such as Kahan summation.

stefanvodita avatar Jan 13 '24 11:01 stefanvodita

Neat -- I had never heard of Kahan summation. Here is its Wikipedia page.

mikemccand avatar Jan 13 '24 13:01 mikemccand

Are we not already summing into intermediate double and then truncating to float at the end? Might be the easiest win, without performance hassles

rmuir avatar Jan 13 '24 17:01 rmuir

Right now, the accumulators in FloatTaxonomyFacets are floats. We keep a value for each ordinal, so we can end up using a lot of memory. With doubles, we would increase memory usage even more. Still, it's worth considering if that's better than the extra overhead from a fancy summation algorithm.

stefanvodita avatar Jan 14 '24 06:01 stefanvodita