sequenceTubeMap icon indicating copy to clipboard operation
sequenceTubeMap copied to clipboard

Contain/restrict memory usage for vg commands, per-job and in total

Open adamnovak opened this issue 10 months ago • 1 comments

Today the demo server went off the network, in a way that looks like https://github.com/systemd/systemd/issues/32045#issuecomment-2575269340

It happened after the OOM killer was invoked and killed a vg process that was using about 1.5G of memory. The server only actually has about 2G of memory, and if we let the OOM killer run it can break the system.

So we need a way to restrict the total memory in use by in-flight vg processes the server is running, to under some limit that still leaves room for Node, to prevent the OOM killer from running, maybe by setting a ulimit for the child vg processes and only letting a certain number be running at a time.

We also might need to lower the file upload size limit, if the current limit (I think 5M?) is large enough that it's easy to cause a lot of memory use.

adamnovak avatar Feb 11 '25 22:02 adamnovak

I've added a Cloudwatch alarm to reboot the instance if it drops off the network for about an hour.

This is a bad solution and we should instead ensure that no user request or combination of user requests running at once can use more than some set fraction of the host's resources.

adamnovak avatar Apr 04 '25 21:04 adamnovak