cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

sql: reset conn buffers after each query

Open DrewKimball opened this issue 2 years ago • 3 comments

There are two buffers in pgwire.conn that are used when sending results to the client. They have to buffer each row in its entirety before flushing, so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the size limit. This could cause a long-running session to hold on to large amounts of memory until the session was ended. This increased the risk of OOM, in addition to being inefficient.

This commit adds a method maybeReallocate to pgwire.conn, which reallocates each buffer if it has reached the size limit. maybeReallocate is called upon closing Sync and Flush command results. These are natural places to reallocate the buffers, since they already flush the buffers to the client and are called roughly at the statement granularity, rather than for every row. This allows long-running sessions to bound their memory usage without causing reallocation on every wide row that is sent to the client.

Informs #80497

Release note (performance improvement): Long-running sql sessions are now less likely to maintain large allocations for long periods of time, which decreases the risk of OOM and improves memory utilization.

DrewKimball avatar Aug 11 '22 09:08 DrewKimball

This change is Reviewable

cockroach-teamcity avatar Aug 11 '22 09:08 cockroach-teamcity

To verify that this change works as expected, I ran the following queries in the same session on a single demo node:

--Return a wide row to the client to grow the buffers.
SELECT string_agg(x, '')::BYTES FROM (SELECT 'vbX5ajNEOh1VlLJ4' FROM generate_series(1, 400000)) s(x);

--Sleep to simulate a long-running query that keeps the session active.
SELECT pg_sleep(600);

I then waited 3 minutes to ensure the first query was finished and that garbage collection had time to take place. Here are the resulting memory profiles for the node: Master Screen Shot 2022-08-11 at 1 40 40 AM Branch Screen Shot 2022-08-10 at 7 31 26 PM

DrewKimball avatar Aug 11 '22 09:08 DrewKimball

kk

knz avatar Aug 11 '22 09:08 knz

TFTRs!

DrewKimball avatar Aug 18 '22 08:08 DrewKimball

Test failures look like unrelated flakes.

bors r+

DrewKimball avatar Aug 18 '22 08:08 DrewKimball

Build succeeded:

craig[bot] avatar Aug 18 '22 09:08 craig[bot]

I think we should backport this. Do you all agree?

mgartner avatar Aug 24 '22 12:08 mgartner

I'm in favor.

yuzefovich avatar Aug 24 '22 14:08 yuzefovich

👍 @DrewKimball Can you backport this to 22.1 and 21.2?

mgartner avatar Aug 24 '22 18:08 mgartner

Will do

DrewKimball avatar Aug 24 '22 18:08 DrewKimball

blathers backport 22.1 21.2

DrewKimball avatar Aug 24 '22 18:08 DrewKimball

Oh lol... I forgot blathers could do that - I would have done it myself. Sorry!

mgartner avatar Aug 24 '22 18:08 mgartner

To be honest, I wasn't expecting it to be so smooth. I'm pleasantly surprised

DrewKimball avatar Aug 24 '22 18:08 DrewKimball