sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Pytorch 2.0.0 leaks memory when using model.compile

Open usamec opened this issue 2 years ago • 0 comments

Describe the bug Trition version is old and affected by this https://github.com/pytorch/pytorch/issues/96937

To reproduce

See attached issue.

Expected behavior

No leaks.

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.165.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 2.0.0
  • Python version: 3.10
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context You are seriously using development version of packages???

Found existing installation: triton 2.0.0.dev20221202

Adding: triton==2.0.0.post1 into requirements fixes the issue.

Honestly, when we are paying much more for Sagemaker training compared to EC2, I would expect some level of support and comfort.

usamec avatar Jun 27 '23 08:06 usamec