sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Pytorch 2.0.0 leaks memory when using model.compile
Describe the bug Trition version is old and affected by this https://github.com/pytorch/pytorch/issues/96937
To reproduce
See attached issue.
Expected behavior
No leaks.
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.165.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
- Framework version: 2.0.0
- Python version: 3.10
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context You are seriously using development version of packages???
Found existing installation: triton 2.0.0.dev20221202
Adding:
triton==2.0.0.post1 into requirements fixes the issue.
Honestly, when we are paying much more for Sagemaker training compared to EC2, I would expect some level of support and comfort.