deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[bug]Multi-model endpoint creation fails on pytorch GPU inference images

Open rohithkrn opened this issue 2 years ago • 1 comments

Checklist

  • [x] I've prepended issue tag with type of change: [bug]
  • [ ] (If applicable) I've attached the script to reproduce the bug
  • [x] (If applicable) I've documented below the DLC image/dockerfile this relates to
  • [ ] (If applicable) I've documented below the tests I've run on the DLC image
  • [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)

Concise Description: Multi-model endpoint creation is failing on pytorch GPU inference images because the label com.amazonaws.sagemaker.capabilities.multi-models=true is missing. Following the steps here: https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/multi_model_bring_your_own/multi_model_endpoint_bring_your_own.html#Import-models-into-hosting Fails during sm_client.create_model call

DLC image/dockerfile: 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker Failing with older versions of the image as well (e.g 1.10, 1.9).

Current behavior: This call returns error sm_client.create_model( ModelName=model_name, ExecutionRoleArn=role, Containers=[container] ) Error log:

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Your Ecr Image 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker does not contain required com.amazonaws.sagemaker.capabilities.multi-models=true Docker label(s).

Expected behavior: Being able to successfully create multi-model endpoint with pytorch gpu inference image. Additional context:

rohithkrn avatar Jul 28 '22 16:07 rohithkrn

@lxning for awareness

rohithkrn avatar Jul 28 '22 17:07 rohithkrn

I am also having this issue, would love for this to get fixed soon

thearod5 avatar Apr 09 '23 19:04 thearod5

this looks to be solved with the latest img 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker

devkosal avatar May 20 '23 02:05 devkosal

We no longer support PyTorch 1.12. We recommend you upgrading to the later version of PyTorch DLCs, see available_images.md for more information.

Feel free to reopen if the issue is still observed in later versions.

sirutBuasai avatar Mar 27 '24 00:03 sirutBuasai