sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

TritonServer Container shm-size

Open jihys opened this issue 2 years ago • 3 comments

I am using aws provided tritonserver container for my ensemble model serving. However, one of my python-backend module requires /dev/shm higher than 64MB. I have tested SM serving in local mode and it works well after I have changed "default-shm-size": "5G" config inside /etc/docker/daemon.json. However I cannot run this model in g4dn.xlarge nor g4dn.2xlarge type SM endpoints. Is there any way to increase /dev/shm for SM endpoint?

Below is cloudwatch logs for the failure.

| pre_multi_skin | 1 | UNAVAILABLE: Internal: Unable to initialize shared memory key '/pre_multi_skin_GPU_0' to requested size (67108864 bytes). If you are running Triton inside docker, use '--shm-size' flag to control the shared memory region size. Each Python backend model instance requires at least 64MBs of shared memory. Flag '--shm-size=5G' should be sufficient for common usecases. Error: No such file or directory |

jihys avatar Dec 16 '21 05:12 jihys

I'm having the same issue when running an ensemble model with multiple Python Backend steps, each of them requires at least 64mb of shared memory (ref) and the current triton server doesn't have enough (64mb I guess since it's the default value of docker run command).

It could be great if we are able to pass params to docker run command so we can pass --shm-size 5g.

sonduong305 avatar Dec 17 '21 02:12 sonduong305

Is there any other update on this issue? To deploy a model ensemble in sagemaker triton, we are using create_model. However, there isn't any option to set the shm-size.

farzanehnakhaee70 avatar Jul 20 '22 13:07 farzanehnakhaee70

The shared memory size increase to half of the instance memory size. I have tested my python backend and this works except the multi model is still issue in bls mode. https://docs.aws.amazon.com/sagemaker/latest/dg/triton.html

jihys avatar Aug 09 '22 11:08 jihys

The Triton Python backend uses shared memory (SHMEM) to connect your code to Triton. SageMaker Inference provides up to half of the instance memory as SHMEM so you can use an instance with more memory for larger SHMEM size. as mentioned here. Kindly switch to an instance with more memory to resolve the issue.

knikure avatar Jan 03 '24 13:01 knikure