server icon indicating copy to clipboard operation
server copied to clipboard

health check should not say it's ready when cuda device-side assertion error is triggered

Open ghost opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. If one of the model triggers a cuda device-side assertion error, all the model instances from that trtis process is blocked from calling any further cuda api calls. In this case, the trtis health check api should report 'not ready', but right now it reports that the server and the model are ready.

# example response reporting a cuda assertion error
{"error":"pinned input buffer H2D: failed to perform CUDA copy: device-side assert triggered"}'
$ curl -v localhost:8000/v2/health/ready
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

$ curl -v localhost:8000/v2/health/live
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/live HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

$ curl -v localhost:8000/v2/models/cuda-error/ready
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/models/cuda-error/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

Describe the solution you'd like If there are gpu models, trtis health check api should check if cuda device side assertion error is triggered and report 'not ready' if it is so.

Additional context This is a simple model to trigger a cuda device side assertion error

import torch
import torch.nn as nn
class CudaError(nn.Module):
    def __init__(self,):
        super().__init__()
    def forward(self,x):
        return torch.gather(torch.tensor([0]).cuda(),dim=0,index=torch.tensor([1]).cuda()).type(torch.int32)
torch.jit.save(torch.jit.script(CudaError()),'model.pt')

ghost avatar Nov 27 '21 07:11 ghost

As a work around, you can unload the model, and use the model's health check path (which would return unhealthy after the model is unloaded.

Assuming the model is called mymodel, this is how you can unload it:

from urllib import request as urllib_request

try:
    // inference logic ...

except RuntimeError as e:
    if 'CUDA error: device-side assert triggered' in str(e):
        unload_request = urllib_request.Request('http://localhost:5000/v2/repository/models/mymodel/unload', method='POST')
        with urllib_request.urlopen(unload_request) as resp:
            pass
    raise

The model's health check path is at /v2/models/mymodel/ready. Calls to the model's health check will fail after the model is unloaded.

yukunlin avatar Aug 08 '22 21:08 yukunlin

so we still needs this workaround? I found when signal 11 received, the health check is still ok😂

Jack47 avatar Aug 06 '23 12:08 Jack47

For example: python backend process hangs with this signal 11 error, but triton readiness is still ok, so request are incoming

DownCropResizer is doing nothing!
{"pod_name": "sd15-triton-5cc495c8cc-zjhmx", "namespace": "production", "log_type": "access_log", "request_id": "3be8a710-3428-11ee-ba3c-00163e253f9a", "encoder_msg": "jasee", "event": "kestrel encode", "logger": "triton_logger", "level": "info", "timestamp": "2023-08-06T07:09:56.292837Z"}
Signal (11) received.

Do we have model readiness probes?

Jack47 avatar Aug 06 '23 12:08 Jack47

I also observed this issue. I can reproduce it by overloading triton with more images/sec than it can handle. It will provoke this segfault / "Signal (11) received." message but still report it is ready. In my opinion this is a bug in triton, especially if strict readiness check is enabled!

tfriedel avatar Aug 18 '23 20:08 tfriedel