infinity When the inference process encounters an out-of-memory (OOM) error, can the service automatically recover？

Feature request

infinity version: 0.0.75

I noticed that when a GPU OOM occurs, the service hangs and new requests cannot be executed. Could you provide a mechanism for the service to automatically recover from GPU OOM or other exceptions, such as by restarting the service? This feature is important for long-running online services.

The log information below is from a simulated out-of-memory (OOM) situation, causing the service to hang.

infinity.log

Motivation

This feature is important for long-running online services.

Your contribution

Currently, there isn't any.

Mar 04 '25 02:03 xjpang

Facing the same issue here. IMHO, even crashing the infinity process in such situation would be a better way, because pod can then get auto restarted. or this hang should be reflected as a result in the /health endpoint.

Mar 05 '25 15:03 starsy

Facing the same issue here. IMHO, even crashing the infinity process in such situation would be a better way, because pod can then get auto restarted. or this hang should be reflected as a result in the /health endpoint.

Currently, I determine whether to restart Infinity by monitoring the abnormal keywords in the Infinity logs, but I feel that this approach is not very elegant.

Mar 06 '25 12:03 xjpang