sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
workflow._RepackModelStep can fail if source_dir contains requirements.txt
Describe the bug
_RepackModelStep
is a part of the sagemaker.workflow.model_step.ModelStep
.
Its purpose is to attach a source_dir
to the plain model.tar.gz
.
As I understand, this is done using a training step that does not do any real training but just repacks the model, see
https://github.com/aws/sagemaker-python-sdk/blob/284ddbebcf6240f0a4d3c734244f8e8ad066a9b3/src/sagemaker/workflow/_utils.py#L155-L158
If the source_dir contains a requirements.txt
, this step can fail. This is due to the fact in the step above the requirements will be installed even though they might not be compatible. See attached logs.
To reproduce
An example of a source_dir where this is the case is Jumpstart LightGBM Inference source_dir available at s3://jumpstart-cache-prod-us-east-2/source-directory-tarballs/lightgbm/inference/regression/v1.1.0/sourcedir.tar.gz
Its requirements.txt looks like this.
/opt/ml/model/code/lib/lightgbm/tenacity-8.0.1-py3-none-any.whl
/opt/ml/model/code/lib/lightgbm/plotly-5.1.0-py2.py3-none-any.whl
/opt/ml/model/code/lib/lightgbm/graphviz-0.17-py3-none-any.whl
...
Adding this source_dir to a model using the following code results in pipeline failure.
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri_cache,
entry_point="inference.py",
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
sagemaker_session=pipeline_session,
role=role,
)
step_model = ModelStep("RegisterModel",
step_args=model.register(
content_types=["text/csv"],
response_types=["application/json"],
inference_instances=["ml.t2.medium"],
transform_instances=["ml.m5.xlarge"],
model_package_group_name="lgbm-test",
))
(Note that in the above code, deploy_source_uri_cache
points to a copy of s3://jumpstart-cache-prod-us-east-2/source-directory-tarballs/lightgbm/inference/regression/v1.1.0/sourcedir.tar.gz
as it is modified by _RepackModelStep
)
Expected behavior
_RepackModelStep
should work regardless of the content of source_dir.
Screenshots or logs
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.103.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): JumpStart LightGBM
- Framework version: N/A
- Python version: 3.8
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context Add any other context about the problem here.