sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
TritonServer Container shm-size
I am using aws provided tritonserver container for my ensemble model serving. However, one of my python-backend module requires /dev/shm higher than 64MB. I have tested SM serving in local mode and it works well after I have changed "default-shm-size": "5G" config inside /etc/docker/daemon.json. However I cannot run this model in g4dn.xlarge nor g4dn.2xlarge type SM endpoints. Is there any way to increase /dev/shm for SM endpoint?
Below is cloudwatch logs for the failure.
| pre_multi_skin | 1 | UNAVAILABLE: Internal: Unable to initialize shared memory key '/pre_multi_skin_GPU_0' to requested size (67108864 bytes). If you are running Triton inside docker, use '--shm-size' flag to control the shared memory region size. Each Python backend model instance requires at least 64MBs of shared memory. Flag '--shm-size=5G' should be sufficient for common usecases. Error: No such file or directory |
I'm having the same issue when running an ensemble model with multiple Python Backend steps, each of them requires at least 64mb of shared memory (ref) and the current triton server doesn't have enough (64mb I guess since it's the default value of docker run command).
It could be great if we are able to pass params to docker run command so we can pass --shm-size 5g
.
Is there any other update on this issue? To deploy a model ensemble in sagemaker triton, we are using create_model. However, there isn't any option to set the shm-size.
The shared memory size increase to half of the instance memory size. I have tested my python backend and this works except the multi model is still issue in bls mode. https://docs.aws.amazon.com/sagemaker/latest/dg/triton.html
The Triton Python backend uses shared memory (SHMEM) to connect your code to Triton. SageMaker Inference provides up to half of the instance memory as SHMEM so you can use an instance with more memory for larger SHMEM size.
as mentioned here.
Kindly switch to an instance with more memory to resolve the issue.