High memory usage while loading onnx model with optimum engine

Open molntamas opened this issue 8 months ago • 1 comments

System Info

When running the infinity cpu docker image with optimum engine with an onnx image, the memory usage goes up very high temporeraly. For example with the model above, the memory usage goes above over 10GB while starting the image, and later goes below 3 GB. Since it is temporeraly using a lot of memory, hosting this model is wasting resources for me.

I'm wondering if this is an issue in the infinity server, or something else.

services: classifier-api: image: michaelf34/infinity:0.0.76-cpu command: - v2 - --engine - optimum - --model-id - Qdrant/multilingual-e5-large-onnx - --port - "5000" - --device - cpu ports: - "5000:5000" environment: INFINITY_MODEL_WARMUP: 0

My logs about memory and cpu usage, before it crashed:

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 0.89% 385.8MiB / 4.808GiB 7.84% 16.6kB / 3.82kB 0B / 0B 6 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 28.04% 710.5MiB / 4.808GiB 14.43% 163MB / 4.15MB 0B / 0B 169 ... 21f7eeb0f6c4 my-api 32.76% 1.753GiB / 4.808GiB 36.47% 2.21GB / 47.2MB 0B / 0B 12 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 1.07% 1.751GiB / 4.808GiB 36.43% 2.33GB / 49.5MB 0B / 0B 12 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 0.68% 1.751GiB / 4.808GiB 36.43% 2.33GB / 49.5MB 0B / 0B 12 ... CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 70.63% 3.061GiB / 4.808GiB 63.66% 2.37GB / 50.2MB 0B / 0B 9 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 100.21% 3.363GiB / 4.808GiB 69.95% 2.37GB / 50.2MB 0B / 0B 9 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 98.26% 2.37GiB / 4.808GiB 49.29% 2.37GB / 50.2MB 0B / 0B 9 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 73.99% 2.364GiB / 4.808GiB 49.16% 2.37GB / 50.2MB 0B / 0B 9 CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 21f7eeb0f6c4 my-api 74.32% 4.064GiB / 4.808GiB 84.53% 2.37GB / 50.2MB 0B / 0B 9

Information

[x] Docker + cli
[ ] pip + cli
[ ] pip + usage of Python interface

Tasks

[ ] An officially supported CLI command
[ ] My own modifications

Reproduction

Steps to reproduce:

create a docker compose file with the example in the descpriton
execute docker compose
output the memory usage of the container. For instance on Windows with this powershell script:

while ($true) {
    docker stats --no-stream | Out-File -Append -FilePath docker_stats.log
    Start-Sleep -Seconds 5
}

Expected:

The memory footprint of the container should be below 3GB Actual:
The memory usage goes above 4 GB
In my system where there was not enough memory, the application crashed with code 137

For other engines, like pytorch and ctranslate, the memory usage is as expected.

May 05 '25 08:05 molntamas

I'm also wondering what the next best option is to host classification models on CPU? Pytorch enginge seems to load fine.

May 05 '25 08:05 molntamas