materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Temporal filters have an off-by-one error

Open frankmcsherry opened this issue 1 year ago • 1 comments

What version of Materialize are you using?

main

What is the issue?

We introduced in https://github.com/MaterializeInc/materialize/pull/14463 an off-by-one error, recorded in temporal.slt:

# We don't currently support specifying the max timestamp due to a limitation in linear.rs and the step_mz_timestamp internal function.
# That big number there should be u64::MAX.
statement ok
CREATE VIEW intervals_max (a, b) AS VALUES (0, 18446744073709551615)

The issue is that rather than perform a += 1 in the space of numerics and then determine if the result fits in a u64, we do the increment in a u64 natively and produce a EvalError::MzTimestampStepOverflow if it overflows. This means that the behavior of selecting from a index and a view can be different if the data contains u64::MAX (the index will error, the view will not).

I think in principle one should be able to guard evaluation in linear.rs with the logic looks for this error specifically and when observed performs the pre-#14463 behavior (which depends on whether it is lower or upper, but largely "ignoring the bound" rather than erroring).

Edit: to reframe, the issue is that there is different behavior with and without an index. Our two evaluation strategies, 1. replace mz_now() by a known timestamp and 2. build a dataflow widget, produce different outputs on some inputs. Those inputs affected are few, as off-by-one errors tend to be, but the difference between the two means has the potential to introduce unknown bad behavior.

frankmcsherry avatar May 13 '24 14:05 frankmcsherry

We'll get back to this later. Or maybe it could be considered as part of https://github.com/MaterializeInc/materialize/pull/27001, where more general occurrences of this problem are being discussed.

ggevay avatar May 15 '24 15:05 ggevay