amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard
[Bug Report]
Link to the notebook All the PyTorch NEO compilation jobs in this directory
Describe the bug
Running the pytorch example notebooks:
unchanged, on ml.c5.xlarge, conda_pytorch_p38 kernel, yields the following error:
ClientError: An error occurred (ValidationException) when calling the CreateCompilationJob operation: Unsupported framework version field for target. Framework version is supported for Target Platform configuration and only part of target devices.
Framework version is only supported for ml_c4, ml_c5, ml_m4, ml_m5, ml_p2, ml_p3, ml_g4dn cloud targetsand lambda, jetson_tx1, jetson_nano, jetson_tx2, jetson_xavier, deeplens, rasp3b, rasp4b, imx8qm, rk3288, rk3399, aisage, sbe_c, qcs605, qcs603, x86_win32, x86_win64 edge devices.
The same run was working on Friday morning.
To reproduce
- Launch a sagemaker notebook instance on a ml.c5.xlarge machine, tweaking memory to 15GB and cloning the
https://github.com/aws/amazon-sagemaker-examples/
repo at in startup configuration - Upon startup, launch one of the pytorch neo compilation notebooks (listed above) with the conda_pytorch_p38 kernel and execute in order the notebook cells.
- The ClientError occurs at the compilation step (looks like
neo_model = pytorch_model.compile(...)
)
What I've tried
- I've tried changing the notebook pytorch version to every version between 1.5.1 and 1.11.0
- I've tried changing the framework_version argument of the
PyTorchModel
object and thecompile
method to the corresponding versions.
I would appreciate any help in sorting out what is going wrong.
Dan Ringwald
Hello @dan-ringwald, I encountered the same error with SageMaker Python SDK v2.80.0.
Could you check your SDK version with this command in the notebook cell?
pip show sagemaker boto3 botocore
The workaround is to specify an older version 2.79.0. This worked for me.
!pip install -U sagemaker==2.79.0
Hello @hariby, Sorry for the late reply, Next time i pop up my sagemaker compilation notebook i will double-check the version of the SDK but i am pretty sure it was the v2.80, as i checked i got the latest version. I will let you know if the version downgrade does the trick Edit: It did the trick. The new version v2.81 also triggers the error, but the 2.79 works fine γζδΌγγ©γγγγγγ¨γ