sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

VRAM not released after client disconnect in sherpa-onnx-online-websocket-server

Open Alekksander66 opened this issue 2 months ago • 5 comments

Hello !

When using sherpa-onnx-online-websocket-server with CUDA provider, GPU VRAM usage keeps increasing after handling multiple WebSocket connections.

Even after a client finishes streaming (Done message sent, final result returned, connection closed), VRAM allocated by the model is not released. Over time, this leads to out-of-memory (OOM) errors or forces the server process to crash/restart.

This issue makes it impossible to run the server under heavy load (hundreds of concurrent streams), since VRAM usage grows linearly with the number of completed connections.

./bin/sherpa-onnx-online-websocket-server \
  --port=8080 \
  --num-work-threads=16 \
  --num-io-threads=8 \
  --tokens=./models/tokens.txt \
  --encoder=./models/encoder.onnx \
  --decoder=./models/decoder.onnx \
  --joiner=./models/joiner.onnx \
  --provider=cuda \
  --max-batch-size=128 \
  --loop-interval-ms=10

Actual behavior

VRAM usage keeps increasing after each client disconnect. Even though connections are removed from connections_ in OnlineWebsocketDecoder::ProcessConnections, the GPU memory is not freed. Eventually the server hits OOM and restarts.

Environment

sherpa-onnx version: latest

Build type: The binaries from wget https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.13/sherpa-onnx-v1.12.13-cuda-12.x-cudnn-9.x-linux-x64-gpu.tar.bz2

CUDA version: 12.8

GPU: NVIDIA H100 80GB

OS: nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04

The issue seems related to OnlineRecognizer / OnlineStream not freeing GPU state after InputFinished + removal from connections_. I tried adding manual cleanup (resetting stream, clearing connections), but VRAM still accumulates.

On CPU provider, the memory is released correctly. On CUDA provider, VRAM grows continuously with each completed stream.

Alekksander66 avatar Sep 25 '25 10:09 Alekksander66