When the inference process encounters an out-of-memory (OOM) error, can the service automatically recover?
Feature request
infinity version: 0.0.75
I noticed that when a GPU OOM occurs, the service hangs and new requests cannot be executed. Could you provide a mechanism for the service to automatically recover from GPU OOM or other exceptions, such as by restarting the service? This feature is important for long-running online services.
The log information below is from a simulated out-of-memory (OOM) situation, causing the service to hang.
Motivation
This feature is important for long-running online services.
Your contribution
Currently, there isn't any.
Facing the same issue here. IMHO, even crashing the infinity process in such situation would be a better way, because pod can then get auto restarted. or this hang should be reflected as a result in the /health endpoint.
Facing the same issue here. IMHO, even crashing the infinity process in such situation would be a better way, because pod can then get auto restarted. or this hang should be reflected as a result in the
/healthendpoint.
Currently, I determine whether to restart Infinity by monitoring the abnormal keywords in the Infinity logs, but I feel that this approach is not very elegant.