sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Repack always uses sklearn image
When registering a pytorch model, using python 3.8, it fails because the requirements.txt includes dependencies (numpy is the one failing in this case, but may be others) that cannot be installed during the repack step.
We've found that the repack image cannot be changed and that right now it is an image using Python 3.7, a Python version not supported by our numpy version. https://github.com/aws/sagemaker-python-sdk/blob/56452f100696e3f9db3621fb8de44ec7034263d4/src/sagemaker/workflow/_utils.py#L40
With numpy we can probably remove it from the requirement.txt but there might be other libraries that we cannot ignore this way.
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.88.3
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
- Framework version: 1.10
- Python version: py38
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Hi @jponf , thanks for using Sagemaker! You're right, currently the repack image is fixed to Sklearn.
Can you share us the code snippet to reproduce the issue? I'm wondering why we install the dependencies listed in requirements.txt
in the repack step. The code sample can help us to understand the issue and figure out a way to bypass it or improve.
Hi @qidewenwhen,
Please find attached a small example with a dummy training step and a register step. The expected outcome is that, after training, the repack step will fail because inside the ./src
directory we placed a requirements.txt
with a package that only works on Python 3.8 or greater.
I'll be away for a week or so, in the meantime if you need anything else you can ask my colleague @Guillem96.
Thanks for the code samples! It helps a lot to quickly reproduce the issue.
The repack step is invoking a TrainingJob under the hood to simply repack custom dependencies and code into an existing model TAR archive.
I guess the requirement.txt
is target for the register model step. Though requirement.txt
is not intentionally to be used in the repack step, seems the training job there would regardlessly install every dependencies listed inside.
I've added this in our backlog and will raise this to my team for discussion. Will get back to you once we figure out the next steps.