dstack icon indicating copy to clipboard operation
dstack copied to clipboard

dstack server consumes lots of CPU correlated wtih opening SSH connections

Open r4victor opened this issue 5 months ago • 3 comments

We recently debugged a case when running multiple server replicas led to high DB load, many active DB sessions, and extremely slow DB queries. This turned out to be caused by the dstack server not having enough vCPU, which meant longer DB transactions and the cascading effect that put the load on the DB.

When running a few dozen runs, the server CPU util spikes 10x every few seconds (I notice 1-2% -> 10-15% on my local machine). This coincides with times when dstack opens SSH connections in process_instances and process_running_jobs (/usr/bin/ssh -F ... processes appear).

Ideally, the CPU util should be reduced so that dstack can handle processing a claimed number of resources (e.g. 150 runs) using 1 vCPU (a popular choice). At the moment, 2 vCPU is the minimum for processing dozens of runs.

A solution can be an SSH connection pool if establishing SSH connection is proved to be the primary cause for loading the CPU.

r4victor avatar Jul 29 '25 09:07 r4victor