sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Repack always uses sklearn image

Open jponf opened this issue 2 years ago • 3 comments

When registering a pytorch model, using python 3.8, it fails because the requirements.txt includes dependencies (numpy is the one failing in this case, but may be others) that cannot be installed during the repack step.

We've found that the repack image cannot be changed and that right now it is an image using Python 3.7, a Python version not supported by our numpy version. https://github.com/aws/sagemaker-python-sdk/blob/56452f100696e3f9db3621fb8de44ec7034263d4/src/sagemaker/workflow/_utils.py#L40

With numpy we can probably remove it from the requirement.txt but there might be other libraries that we cannot ignore this way.

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.88.3
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.10
  • Python version: py38
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

jponf avatar May 27 '22 13:05 jponf

Hi @jponf , thanks for using Sagemaker! You're right, currently the repack image is fixed to Sklearn.

Can you share us the code snippet to reproduce the issue? I'm wondering why we install the dependencies listed in requirements.txt in the repack step. The code sample can help us to understand the issue and figure out a way to bypass it or improve.

qidewenwhen avatar Jul 15 '22 00:07 qidewenwhen

Hi @qidewenwhen,

Please find attached a small example with a dummy training step and a register step. The expected outcome is that, after training, the repack step will fail because inside the ./src directory we placed a requirements.txt with a package that only works on Python 3.8 or greater.

I'll be away for a week or so, in the meantime if you need anything else you can ask my colleague @Guillem96.

sagemaker-issue-3143.zip

jponf avatar Jul 15 '22 08:07 jponf

Thanks for the code samples! It helps a lot to quickly reproduce the issue.

The repack step is invoking a TrainingJob under the hood to simply repack custom dependencies and code into an existing model TAR archive. I guess the requirement.txt is target for the register model step. Though requirement.txt is not intentionally to be used in the repack step, seems the training job there would regardlessly install every dependencies listed inside.

I've added this in our backlog and will raise this to my team for discussion. Will get back to you once we figure out the next steps.

qidewenwhen avatar Jul 16 '22 05:07 qidewenwhen

Hi @jponf, sorry for the late update. The fix has been released in v2.114.0 and I just verified the fix with your zip files and the latest SageMaker Python SDK version v2.116.0, which worked well on my side.

Closing this issue at this point and feel free to reopen if you have any questions.

qidewenwhen avatar Oct 31 '22 21:10 qidewenwhen