matrixone
matrixone copied to clipboard
[Bug]: OOM when concurrent query
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93):
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
In a 32c64G VM, 2 query arrives mo-server concurrently, and the VM crashed because of OOM.
data:image/s3,"s3://crabby-images/41a50/41a50870a601da027ceedf3c68e530422b6d75dd" alt="image"
Expected Behavior
For now, MatrixOne does not support runtime memory limitation, the default memory quota of host/guest VM has been set to 1 << 40 (https://github.com/matrixorigin/matrixone/blob/main/cmd/db-server/main.go#L294)which is unreasonable. Maybe the server needs to be optimized to queue or refuse queries when there is not enough system resource.
Steps to Reproduce
No response
Additional information
No response
This requires frontend to set a memory limit, for a machine where the host memory limit is shared and then the guset memory limit is independent for each query. If set correctly then the oom problem can be avoided. @daviszhen
Also frontend may need to do some simple queuing or scheduling of queries, otherwise too much concurrency may cause all queries to fail. Also, the current usage is wrong, as each session is a separate host memory limit and guest memory limit, and the values are set much higher than the memory limit.
I get it. The problem will be delayed now. And It will be solved later 0.3 or earlier 0.4.
In 0.6.0, this issue needs to be kept an eye on and fixed.
Key insight is memory accounting.
I need run this for several days
tonigth will run point_select for 12 hours