server Failed to allocated memory for requested buffer of size X

So I was trying to deploy a custom model on the tritonserver(23.08) with the onnxruntime_backend(onnxruntime version 1.15.1). But while doing so, we are facing this issue:

onnx runtime error 6: Non-zero status code returned while running Mul node. Name:\'Mul_8702\' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2830172160\

There are 7 other models also hosted on the the same server and those work fine(even under stress) but things break once this new model is added. Any idea why this might be happening? The server is also hosted in a T4 gpu and these are our current stats:

| model_control_mode               | MODE_NONE                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | strict_model_config              | 0                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | rate_limit                       | OFF                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | pinned_memory_pool_byte_size     | 268435456                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                              │
│                                                            |                                                                                                                               │
│ | min_supported_compute_capability | 6.0                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | strict_readiness                 | 1                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | exit_timeout                     | 30                                                                                                                                                    │
│                                                            |                                                                                                                               │
│ | cache_enabled                    | 0                                                                                                                                                     │
│                                                            |

Any help on understanding why this might be caused and how to fix this will be appreciated Thanks!

Mar 21 '24 21:03 aaditya-srivathsan

Hello @aaditya-srivathsan, thanks for reaching out can you provide some more information?

Can you try unloading some other models and then loading this model? It could very well be that your system is running out of memory. Because the log indicates a large size of memory failed to allocate.

Also can you share the output for "nvidia-smi"

Mar 25 '24 19:03 indrajit96

Any updates on this? Currently facing the same issue with one of my deployments.

Aug 26 '24 07:08 steven-channel

server server copied to clipboard

Failed to allocated memory for requested buffer of size X

server
server copied to clipboard