server Errors when loading new models multiple times in quick succession

I've noticed errors when loading and unloading models in very quick succession. I'm using the gRPC python client to communicate with triton. When looking at the logs, it appears as if on some occasions the method unload_model is returning before the model is actually unloaded from memory. So when I try to load in the next model, the server crashes. Code for model switching is below:

def _manage_model(client: InferenceServerClient, model_name: str) -> Tuple[str, str]:
    model_data = client.get_model_repository_index(as_json=True)["models"]
    model_status = {m["name"]: "state" in m and m["state"] == "READY" for m in model_data}
    if model_name not in model_status.keys():
        raise ModelException(f"Requested model not available. Only have {model_status.keys()}")

    if not model_status[model_name]:
        for maybe_loaded_model, is_loaded in model_status.items():
            if is_loaded:
                client.unload_model(maybe_loaded_model)
        client.load_model(model_name)

I'm currently running triton server 21.12 on a T4 instance. Is there a way to check whether a model has successfully unloaded?

edit:

Here's the logging i'm seeing:


2022-09-04T04:27:38.658-07:00	I0904 11:27:38.553679 3352 model_repository_manager.cc:1026] unloading: model_a:1
2022-09-04T04:27:38.658-07:00	I0904 11:27:38.553986 3352 tensorflow.cc:2357] TRITONBACKEND_ModelInstanceFinalize: delete instance state
2022-09-04T04:27:38.658-07:00	I0904 11:27:38.554067 3352 tensorflow.cc:2296] TRITONBACKEND_ModelFinalize: delete model state
2022-09-04T04:27:38.909-07:00	I0904 11:27:38.555486 3352 model_repository_manager.cc:994] loading: model_b:1
2022-09-04T04:27:38.909-07:00	I0904 11:27:38.684448 3352 tensorflow.cc:2270] TRITONBACKEND_ModelInitialize: model_b (version 1)
2022-09-04T04:27:38.909-07:00	I0904 11:27:38.685343 3352 tensorflow.cc:2319] TRITONBACKEND_ModelInstanceInitialize: model_b (GPU device 0)
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.685648: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /opt/ml/model/model_repo/model_b/1/model.savedmodel
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.856472: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.856525: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /opt/ml/model/model_repo/model_b/1/model.savedmodel
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.857009: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.857319: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.857588: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.857888: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-04T04:27:38.909-07:00	2022-09-04 11:27:38.858116: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-04T04:27:39.410-07:00	2022-09-04 11:27:38.858306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13749 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5
2022-09-04T04:27:39.661-07:00	I0904 11:27:39.295864 3352 model_repository_manager.cc:1132] successfully unloaded 'model_a' version 1
2022-09-04T04:27:40.915-07:00	2022-09-04 11:27:39.417347: I tensorflow/cc/saved_model/loader.cc:213] Restoring SavedModel bundle.
2022-09-04T04:27:41.166-07:00	2022-09-04 11:27:40.707772: I tensorflow/cc/saved_model/loader.cc:197] Running initialization op on SavedModel bundle at path: /opt/ml/model/model_repo/model_b/1/model.savedmodel
2022-09-04T04:27:41.166-07:00	2022-09-04 11:27:41.151035: I tensorflow/cc/saved_model/loader.cc:303] SavedModel load for tags { serve }; Status: success: OK. Took 2465393 microseconds.
2022-09-04T04:27:42.920-07:00	I0904 11:27:41.151306 3352 model_repository_manager.cc:1149] successfully loaded 'model_b' version 1
2022-09-04T04:27:42.920-07:00	tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Connection reset by peer
2022-09-04T04:27:43.171-07:00	WARNING:__main__:unexpected tensorflow serving exit (status: 9). restarting.

Sep 04 '22 17:09 lminer

Hi @lminer, when calling unload_model, the function does not wait for the requested model to be fully unload. Unfortunately I don't think we have a way to check for that. If some error occurs, an exception will be raised. Are you able to see the same issue when using http client and different models? I suspect that this may have something to do with the GRPC overhead. Could you try our 22.08 release where we upgraded the GRPC version?

Sep 08 '22 17:09 krishung5

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

Nov 22 '22 03:11 jbkyang-nvi