pgwire::protocol::StateMachine::send_rows keeps large amount of memory allocated
What version of Materialize are you using?
d8783cd45e (Pull Request #25313)
What is the issue?
We had high memory usage during explain plans before (https://github.com/MaterializeInc/materialize/issues/23451), but now I noticed send_rows continuously allocating 800 MB until environmentd is killed: https://buildkite.com/materialize/release-qualification/builds/436#018db189-6b84-4a39-a32a-49bee0a7a4da
mz_2-2024-02-16_12 21 13.pb.gz
Is it expected that this memory is never reclaimed again? In this case we didn't see an OoM, but I was trying to reproduce one and this stood out to me as possibly unexpected. The test only ran explain plans.
Got the OoM reproduced with heap profile, the last profile still looked similar but had grown to 1800 MB:
mz_2-2024-02-16_14 16 23.pb.gz
Can you explain the workload that triggered this? sqlsmith continuously runs a bunch of queries? Does the 800mb stay around even after all connections are closed and no new ones open? This should be reproable locally by running the sqlsmith mzcompose?
Yes, sqlsmith runs a bunch of explain plans continuously for 6000 s, nothing else.. Let me try if bin/mzcompose --find sqlsmith run default --explain-only --runtime 6000 reproduces it and whether the memory usage stays that high after a while of nothing running against Mz.
Actually when disconnecting SQLsmith the environmentd memory usage goes down again, so it's not a memory leak, but a lot allocated to support the connection.
That function holds on to the entire result set as a big Vec<Row> so that absolutely tracks. We don't currently stream in the results, they are sent as a giant blob from compute into the coordinator then into the adapter/connection which sends them over the network.
Closing because working as intended.