materialize icon indicating copy to clipboard operation
materialize copied to clipboard

System time can race ahead of real time

Open frankmcsherry opened this issue 2 years ago • 1 comments

At the moment, we allow the "system time" to race ahead of real time, if we need fresh timestamps to accomplish some behavior. We have several flavors of weird behavior related to this, from ramping up timestamps with contending writes, to anxiety about allowing STRICT SERIALIZABLE queries against data whose since frontier is ahead of the current real time.

We could instead make a hard commitment to never do this, and instead delay the execution of these operations rather than advance system time to match the requirements. For example, a read against data with a since ahead of our current system time can be initiated at since, in the future, as long as we do not respond until the system time has passed the since frontier. This introduces apparent latency but removes "permanent damage" to the system time; others relying on strict serializability may have to wait for the rest of the system to catch up to the fast-forwarded system time, which can take unboundedly long.

Candidate acceptance criteria

Prevent any timestamp oracle from advancing beyond the system clock, or whatever proxy we use for it. NB: this does not mean that the system clock must advance the timestamp oracle; we could still have it lag behind the system clock to support more interactive reads.

frankmcsherry avatar Sep 20 '22 21:09 frankmcsherry

Proposed guardrail: Times served out of timestamp oracle should never exceed system clock. OR never exceed by some interval.

Challenges: Internal metrics written to tables always get a new timestamp and shouldn't be blocked. What if system clock jump significantly?

We might be able to assume that aws clocks will never jump behind some threshold?

ggnall avatar Sep 21 '22 18:09 ggnall

I think this issue can be further broken down into 3 smaller tasks:

  • One known violation of this is when trying to acquire a read hold on an object whose since is ahead of the system clock. #15644 Fixes this.
  • Another known violation is when executing writes to system tables. We increase the oracle's timestamp, regardless of the system clock. We should prevent this from happening. This will be a bit tricky, writes to system tables happen in the middle of executing DDL when the Coordinator is in a partial inconsistent state. So either we need to block the whole system as we wait for the system clock to catch up or re-organize DDL execution so it's safe to wait asynchronously on writes to system tables.
  • There are no other currently known violations, however it would be prudent to add some guard rails to the timestamp oracle API to identify and prevent this scenario. That way we don't accidentally advance a timestamp oracle beyond the system time through some bug or unknown path.

jkosh44 avatar Oct 26 '22 15:10 jkosh44

The DDL issue is probably not worth solving right now. In order for it to manifest, we'd have to execute DDL at a rate faster than 1 DDL statement per millisecond which we can't currently handle. Asynchronous DDL may be beneficial in it's own right, but not for timestamp related reasons.

jkosh44 avatar Oct 26 '22 18:10 jkosh44

We're not worried about the DDL problem for now, closing as done.

maddyblue avatar Nov 23 '22 19:11 maddyblue