server
server copied to clipboard
Shared memory failing in gunicorn following example
I'm trying to use shared memory to run inference from a pytorch model. However, it's failing at set_shared_memory_region
. Any idea why this might be happening? I'm following the official grpc example.
client.unregister_system_shared_memory()
client.unregister_cuda_shared_memory()
input_data = # numpy array that is 16000, 2
input_byte_size = input_data.size * input_data.itemsize
output_byte_size = input_byte_size
# Create inputs and outputs in Shared Memory and store shared memory handles
out_handle = shm.create_shared_memory_region(
"output_data", "/output_simple", output_byte_size * 2
)
client.register_system_shared_memory("output_data", "/output_simple", output_byte_size * 2)
input_handle = shm.create_shared_memory_region("input_data", "/input_simple", input_byte_size)
# Put input data values into shared memory
shm.set_shared_memory_region(input_handle, [input_data]) # <--- DIES HERE
client.register_system_shared_memory("input_data", "/input_simple", input_byte_size)
inputs = [InferInput("MIX", shape=[16000, 2], datatype="FP32")]
inputs[-1].set_shared_memory("input_data", input_byte_size)
outputs = [(InferRequestedOutput("TARGET"))]
outputs[-1].set_shared_memory("output_data", output_byte_size)
outputs.append(InferRequestedOutput("RESIDUAL"))
outputs[-1].set_shared_memory("output_data", output_byte_size)
results = client.infer(
model_name=model_name,
inputs=inputs,
outputs=outputs,
timeout=TIMEOUT_S * 1000000,
)
target = results.get_output("TARGET")
residual = results.get_output("RESIDUAL")
if target is None or residual is None:
raise ModelException("Triton server failed to return target or residual")
target_data = shm.get_contents_as_numpy(
out_handle, utils.triton_to_np_dtype(target.datatype), target.shape
)
residual_data = shm.get_contents_as_numpy(
out_handle, utils.triton_to_np_dtype(residual.datatype), residual.shape
)
I'm using triton 22.07, ubuntu 20.04, an nvidia A6000 GPU with the container: nvcr.io/nvidia/tritonserver:22.07-py3
The error log is:
2022/08/22 21:41:46 [error] 5273#5273: *1 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: , request: "POST /invocations HTTP/1
.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: "localhost:8080"
127.0.0.1 - - [22/Aug/2022:21:41:46 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "curl/7.68.0"
[2022-08-22 21:41:46 +0000] [5268] [WARNING] Worker with pid 5271 was terminated due to signal 7
Hi @lminer ,
Please share the full error output/log you're getting for this issue. Also, please share the version of Triton you're using, GPU type, and other issue template information.
This similar issue may provide some info as well: https://github.com/triton-inference-server/server/issues/3429#issuecomment-1188408564
@rmccorm4 just updated. I had a look at that issue, but that appears to be more CUDA shared memory, would it apply to system memory as well?
Ah, I misread as CUDA shared memory.
Can you try to isolate the error to the specific lines it is failing at and capture the traceback/exception being raised if any? Maybe if you just run the shmem code directly in a simple script instead of through gunicorn, or through some try/excepts.
Currently your error log just shows the gunicorn worker failing rather than the actual Triton errors.
CC @Tabrizian
I haven't been able to catch the exception. Looks like it's erroring out during the call. However, when I try CUDA memory instead, it works fine.
Is the client and server running on the same machine or no? Are you using the same container for both client and server or you are using different containers?
The client and server are running on the same machine and in the same container.
i try it may be can solve the problem.
launch gunicorn
gunicorn -b :9999 api:app -w 10
Demo
"output_data", "/output_simple"
change every "output_data", "/output_simple"
import random
code = random.randint(1,1000)
f"output_data{code}", "/output_simple{code}" #like this
@qinfangzhe what will that do?
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue