deep-learning-containers
deep-learning-containers copied to clipboard
[bug]Multi-model endpoint creation fails on pytorch GPU inference images
Checklist
- [x] I've prepended issue tag with type of change: [bug]
- [ ] (If applicable) I've attached the script to reproduce the bug
- [x] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [ ] (If applicable) I've documented below the tests I've run on the DLC image
- [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
Multi-model endpoint creation is failing on pytorch GPU inference images because the label com.amazonaws.sagemaker.capabilities.multi-models=true
is missing.
Following the steps here: https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/multi_model_bring_your_own/multi_model_endpoint_bring_your_own.html#Import-models-into-hosting
Fails during sm_client.create_model
call
DLC image/dockerfile:
763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker
Failing with older versions of the image as well (e.g 1.10, 1.9).
Current behavior:
This call returns error sm_client.create_model( ModelName=model_name, ExecutionRoleArn=role, Containers=[container] )
Error log:
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Your Ecr Image 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker does not contain required com.amazonaws.sagemaker.capabilities.multi-models=true Docker label(s).
Expected behavior: Being able to successfully create multi-model endpoint with pytorch gpu inference image. Additional context:
@lxning for awareness
I am also having this issue, would love for this to get fixed soon
this looks to be solved with the latest img 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
We no longer support PyTorch 1.12. We recommend you upgrading to the later version of PyTorch DLCs, see available_images.md for more information.
Feel free to reopen if the issue is still observed in later versions.