server icon indicating copy to clipboard operation
server copied to clipboard

Shared memory failing in gunicorn following example

Open lminer opened this issue 2 years ago • 8 comments

I'm trying to use shared memory to run inference from a pytorch model. However, it's failing at set_shared_memory_region. Any idea why this might be happening? I'm following the official grpc example.


    client.unregister_system_shared_memory()
    client.unregister_cuda_shared_memory()

    input_data = # numpy array that is 16000, 2

    input_byte_size = input_data.size * input_data.itemsize
    output_byte_size = input_byte_size

    # Create inputs and outputs in Shared Memory and store shared memory handles
    out_handle = shm.create_shared_memory_region(
        "output_data", "/output_simple", output_byte_size * 2
    )
    client.register_system_shared_memory("output_data", "/output_simple", output_byte_size * 2)
    input_handle = shm.create_shared_memory_region("input_data", "/input_simple", input_byte_size)
    # Put input data values into shared memory
    shm.set_shared_memory_region(input_handle, [input_data])  # <--- DIES HERE
    client.register_system_shared_memory("input_data", "/input_simple", input_byte_size)

    inputs = [InferInput("MIX", shape=[16000, 2], datatype="FP32")]
    inputs[-1].set_shared_memory("input_data", input_byte_size)

    outputs = [(InferRequestedOutput("TARGET"))]
    outputs[-1].set_shared_memory("output_data", output_byte_size)
    outputs.append(InferRequestedOutput("RESIDUAL"))
    outputs[-1].set_shared_memory("output_data", output_byte_size)

    results = client.infer(
        model_name=model_name,
        inputs=inputs,
        outputs=outputs,
        timeout=TIMEOUT_S * 1000000,
    )
    target = results.get_output("TARGET")
    residual = results.get_output("RESIDUAL")
    if target is None or residual is None:
        raise ModelException("Triton server failed to return target or residual")

    target_data = shm.get_contents_as_numpy(
        out_handle, utils.triton_to_np_dtype(target.datatype), target.shape
    )
    residual_data = shm.get_contents_as_numpy(
        out_handle, utils.triton_to_np_dtype(residual.datatype), residual.shape
    )

I'm using triton 22.07, ubuntu 20.04, an nvidia A6000 GPU with the container: nvcr.io/nvidia/tritonserver:22.07-py3

The error log is:

2022/08/22 21:41:46 [error] 5273#5273: *1 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: , request: "POST /invocations HTTP/1
.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: "localhost:8080"
127.0.0.1 - - [22/Aug/2022:21:41:46 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "curl/7.68.0"
[2022-08-22 21:41:46 +0000] [5268] [WARNING] Worker with pid 5271 was terminated due to signal 7

lminer avatar Aug 22 '22 21:08 lminer

Hi @lminer ,

Please share the full error output/log you're getting for this issue. Also, please share the version of Triton you're using, GPU type, and other issue template information.

This similar issue may provide some info as well: https://github.com/triton-inference-server/server/issues/3429#issuecomment-1188408564

rmccorm4 avatar Aug 22 '22 22:08 rmccorm4

@rmccorm4 just updated. I had a look at that issue, but that appears to be more CUDA shared memory, would it apply to system memory as well?

lminer avatar Aug 22 '22 22:08 lminer

Ah, I misread as CUDA shared memory.

Can you try to isolate the error to the specific lines it is failing at and capture the traceback/exception being raised if any? Maybe if you just run the shmem code directly in a simple script instead of through gunicorn, or through some try/excepts.

Currently your error log just shows the gunicorn worker failing rather than the actual Triton errors.

CC @Tabrizian

rmccorm4 avatar Aug 22 '22 22:08 rmccorm4

I haven't been able to catch the exception. Looks like it's erroring out during the call. However, when I try CUDA memory instead, it works fine.

lminer avatar Aug 23 '22 21:08 lminer

Is the client and server running on the same machine or no? Are you using the same container for both client and server or you are using different containers?

Tabrizian avatar Aug 24 '22 14:08 Tabrizian

The client and server are running on the same machine and in the same container.

lminer avatar Aug 24 '22 16:08 lminer

i try it may be can solve the problem.

launch gunicorn

gunicorn -b :9999 api:app -w 10

Demo

"output_data", "/output_simple"

change every "output_data", "/output_simple"

import random
code = random.randint(1,1000)
f"output_data{code}", "/output_simple{code}" #like this

qinfangzhe avatar Sep 08 '22 00:09 qinfangzhe

@qinfangzhe what will that do?

lminer avatar Sep 08 '22 17:09 lminer

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

jbkyang-nvi avatar Nov 22 '22 03:11 jbkyang-nvi