cockroach
cockroach copied to clipboard
colmem: improve memory-limiting behavior of the accounting helpers
colmem: introduce a helper method when no memory limit should be applied
This commit is a pure mechanical change.
Release note: None
colmem: move some logic of capacity-limiting into the accounting helper
This commit moves the logic that was duplicated across each user of the SetAccountingHelper into the helper itself. Clearly, this allows us to de-duplicate some code, but it'll make it easier to refactor the code which is done in the following commit.
Additionally, this commit makes a tiny change to make the resetting behavior in the hash aggregator more precise.
Release note: None
colmem: improve memory-limiting behavior of the accounting helpers
This commit fixes an oversight in how we are allocating batches of the "dynamic" capacity. We have two related ways for reallocating batches, and both of them work by growing the capacity of the batch until the memory limit is exceeded, and then the batch would be reused until the end of the query execution. This is a reasonable heuristic under the assumption that all tuples in the data stream are roughly equal in size, but this might not be the case.
In particular, consider an example when 10k small rows of 1KiB are
followed by 10k large rows of 1MiB. According to our heuristic, we
happily grow the batch until 1024 in capacity, and then we do not shrink
the capacity of that batch, so once the large rows start appearing, we
put 1GiB worth of data into a single batch, significantly exceeding our
memory limit (usually 64MiB with the default workmem setting).
This commit introduces a new heuristic as follows:
- the first time a batch exceeds the memory limit, its capacity is memorized, and from now on that capacity will determine the upper bound on the capacities of the batches allocated through the helper;
- if at any point in time a batch exceeds the memory limit by at least a factor of two, then that batch is discarded, and the capacity will never exceed half of the capacity of the discarded batch;
- if the memory limit is not reached, then the behavior of the dynamic growth
of the capacity provided by
Allocator.ResetMaybeReallocateis still applicable (i.e. the capacities will grow exponentially until coldata.BatchSize()).
Note that this heuristic does not have an ability to grow the maximum capacity once it's been set although it might make sense to do so (say, if after shrinking the capacity, the next five times we see that the batch is using less than half of the memory limit). This is an conscious omission since I want this change to be backported, and never growing seems like a safer choice. Thus, this improvement is left as a TODO.
Also, we still might create batches that are too large in memory footprint in those places that don't use the SetAccountingHelper (e.g. in the columnarizer) since we perform the memory limit check at the batch granularity. However, this commit improves things there so that we don't reuse that batch on the next iteration and will use half of the capacity on the next iteration.
Fixes: #76464.
Release note (bug fix): CockroachDB now more precisely respects the
distsql_workmem setting which improves the stability of each node and
makes OOMs less likely.
colmem: unexport Allocator.ResetMaybeReallocate
This commit is a mechanical change to unexport
Allocator.ResetMaybeReallocate so that the users would be forced to use
the method with the same name from the helpers. This required splitting
off the tests into two files.
Release note: None
I ran microbenchmarks of colexec package, and they mostly show minor speedup. The only concerningly looking result is for OrderedSynchronizer, but when running it in isolation, I see no difference, so I think that outlier can be ignored:
Switching to HEAD^^^^
+ make bench PKG=./pkg/sql/colexec BENCHTIMEOUT=24h BENCHES=BenchmarkOrderedSynchronizer 'TESTFLAGS=-count 10 -benchmem'
Switching to HEAD
+ make bench PKG=./pkg/sql/colexec BENCHTIMEOUT=24h BENCHES=BenchmarkOrderedSynchronizer 'TESTFLAGS=-count 10 -benchmem'
name old time/op new time/op delta
OrderedSynchronizer-24 64.6µs ± 8% 61.9µs ± 6% ~ (p=0.089 n=10+10)
name old speed new speed delta
OrderedSynchronizer-24 381MB/s ± 8% 398MB/s ± 6% ~ (p=0.089 n=10+10)
name old alloc/op new alloc/op delta
OrderedSynchronizer-24 1.00B ± 0% 1.00B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
OrderedSynchronizer-24 0.00 0.00 ~ (all equal)
This PR will benefit from an extra set of eyes since I want to backport this to 22.1, so I'll wait for @michae2 to take a look too.
@michae2 will you have time to take a look at this this week - that's ok if not? I'd like to merge it before stability, and I'm off on Th-F.
I feel like the 'right' way to solve this in the long term might be to make the various Set, Copy and Append methods deterministic in terms of their affect on the memory footprint of the batch. That way, we could check before performing the operation whether it would cause us to exceed the limit and emit immediately if necessary. I guess for Bytes columns that would come down to using make and copy to grow the slice instead of append, and other types should be easier.
I feel like the 'right' way to solve this in the long term might be to make the various Set, Copy and Append methods deterministic in terms of their affect on the memory footprint of the batch. That way, we could check before performing the operation whether it would cause us to exceed the limit and emit immediately if necessary. I guess for Bytes columns that would come down to using make and copy to grow the slice instead of append, and other types should be easier.
It's an interesting idea that'd be worth exploring. We'd need to be conscious about the performance implications though - I'm afraid that introducing those checks would have non-negligible overhead. However, at the same time those conditionals are likely to get very good correct branch prediction most of the time (once the batch has grown to maximum capacity), so maybe the impact would be relatively small. Still we would probably lose the ability to inline most of the Set calls.
Still we would probably lose the ability to inline most of the Set calls.
I think only bytes columns would require any changes in Set, right? Value types like int would have no change, while pointer types like decimal or datum would just add the size of the object, so should also be deterministic.
I think only bytes columns would require any changes in Set, right? Value types like int would have no change, while pointer types like decimal or datum would just add the size of the object, so should also be deterministic.
Yeah, fair point.
bors r-
I think this needs a rebase.
Canceled.
bors r+
Encountered an error creating backports. Some common things that can go wrong:
- The backport branch might have already existed.
- There was a merge conflict.
- The backport branch contained merge commits.
You might need to create your backport manually using the backport tool.
error creating merge commit from aec12906ad7dd8f88bd609cb7a20fda2e0f97e17 to blathers/backport-release-22.1-85440: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []
you may need to manually resolve merge conflicts with the backport tool.
Backport to branch 22.1.x failed. See errors above.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.