cockroach
cockroach copied to clipboard
sql: emit point deletes during delete fastpath
Closes #53939.
Previously, the "deleteRange" SQL operator, which is meant to be a fast-path for cases in which an entire range of keys can be deleted, always did what it said: emitted DeleteRange KV operations. This precludes a crucial optimization: sending point deletes when the list of deleted keys is exactly known.
For example, a query like DELETE FROM kv WHERE k = 10000 uses the
"fast path" delete, since it has a contiguous set of keys to delete, and
it doesn't need to know the values that were deleted. But, in this case,
the performance is actually worse if we use a DeleteRange KV operation
for various reasons (see #53939), because:
- ranged KV writes (DeleteRangeRequest) cannot be pipelined because an enumeration of the intents that they will leave cannot be known ahead of time. They must therefore perform evaluation and replication synchronously.
- ranged KV writes (DeleteRangeRequest) result in ranged intent resolution, which is less efficient, especially until we re-enable time-bound iterators.
The reason we couldn't previously emit point deletes in this case is that SQL needs to know whether it deleted something or not. This means we can't do a "blind put" of a deletion: we need to actually understand whether there was something that we were "overwriting" with our delete.
This commit adds an optional "return key" parameter to DelRequest, which returns true if a value was actually deleted.
Additionally, the deleteRange SQL operator detects when it can emit single-key deletes, and does so.
Release note (performance improvement): point deletes in SQL are now more efficient during concurrent workloads.
Thanks for the review; I really want to come back to this and figure out how to get it over the finish line.
Thanks @yuzefovich, this is great. Is this related to the TPC-E experiment from @nvanbenschoten?
Is this related to the TPC-E experiment
No - I looked over the old open issues from the execution dashboard and found that this PR was still open.
Drive-by bike shed: could we have MVCCDelete return this instead of adding a new function, and unconditionally return it in the response without adding a new request parameter to enable it? Just less surface area to maintain.
I'm hoping to land it before the stability period. @nvanbenschoten will you have time to review this this week? Maybe @erikgrinaker or @sumeerbhola could give this a proper review?
@cockroachdb/sql-queries can someone volunteer to review this too?
I'm hoping to land it before the stability period. @nvanbenschoten will you have time to review this this week? Maybe @erikgrinaker or @sumeerbhola could give this a proper review?
I'm completely swamped before stability, so would appreciate it if someone else had the bandwidth.
I can give this a review in the next day or so. One thing I'll focus on is whether the behavior change of a DELETE on a non-existent row is acceptable. This PR will turn such a statement from a read to a read-write. Related to @sumeerbhola's question, it's not straightforward to avoid this change in behavior. A DeleteRequest can be pipelined and participate in a parallel commit precisely because it unconditionally writes an intent. https://github.com/cockroachdb/cockroach/issues/64723#issuecomment-832894427 could recover the ability to pipeline if we made this intent write conditional, but not the ability to parallel commit.
@andreimatei and I talked through https://github.com/cockroachdb/cockroach/issues/64723 + https://github.com/cockroachdb/cockroach/issues/53939 extensively at some point and I don't recall there being a single solution that was optimal in all cases.