lance
lance copied to clipboard
flaky test: potentially flaky distance::dot::tests::test_dot_f32
https://github.com/lancedb/lance/actions/runs/8794212752/job/24133303595
// Accuracy of dot product depends on the size of the components
// of the vector.
// Imagine that each `x_i` can vary by `є * |x_i|`. Similarly for `y_i`.
// (Basically, it's accurate to ±(1 + є) * |x_i|).
// Error for `sum(x, y)` is `є_x + є_y`. Error for multiple is `є_x * x + є_y * y`.
// See: https://www.geol.lsu.edu/jlorenzo/geophysics/uncertainties/Uncertaintiespart2.html
// The multiplication of `x_i` and `y_i` can vary by `(є * |x_i|) * |y_i| + (є * |y_i|) * |x_i|`.
// This simplifies to `2 * є * (|x_i| + |y_i|)`.
// So the error for the sum of all the multiplications is `є * sum(|x_i| + |y_i|)`.
fn max_error<T: Float + AsPrimitive<f64>>(x: &[f64], y: &[f64]) -> f32 {
let dot = x
.iter()
.cloned()
.zip(y.iter().cloned())
.map(|(x, y)| x.abs() * y.abs())
.sum::<f64>();
(2.0 * T::epsilon().as_() * dot) as f32
}
actually, T::epsilon() is This is the difference between 1.0 and the next larger representable number.
https://doc.rust-lang.org/std/f64/constant.EPSILON.html
however, in IEEE754, the variance in float point number representation is not a constant, and it is also not linear to the number value, for values close to 1, the variance is small (i.e., they have high precision), for very large values, the difference between consecutive representable floating-point numbers can be quite large (i.e., the precision is lower), the reason is that the variance in fraction
part will be amplified by the exponent
part, and larger value has larger exponent
.
so T::epsilon().as() * dot
might not be appropriate here.
// The multiplication of
x_iand
y_ican vary by
(є * |x_i|) * |y_i| + (є * |y_i|) * |x_i|.
this may also have implications, as the є * є
may accumulate in large vector dimensions