sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
FrameworkProcessor is broken with SageMaker Pipelines
Describe the bug
Trying to use any Processor derived from FrameworkProcessor is bugged with SageMaker Pipelines. There is a problem with the command
and entrypoint
parameter, where command
does not pass python3
, causing the following error:
line 2: import: command not found
To reproduce
- Create a FrameworkProcessor (i.e. PyTorchProcessor, TensorFlowProcessor)
- Create a ProcessingStep and a Pipeline
- Execute it
- See it fail
Expected behavior The pipeline should go through.
Screenshots or logs
Screenshot from Pipelines:
Logs from CloudWatch:
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 2: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 3: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 4: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 5: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 6: from: command not found
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.57.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Every Framework
- Framework version: Every version supported by SM
- Python version: 3.8
- CPU or GPU: CPU and GPU
- Custom Docker image (Y/N): N
Additional context N/A
Thanks for raising this @dgallitelli
As discussed offline & detailed further on the linked PR, the integration between FrameworkProcessor
and ProcessingStep
is currently broken: The error you see is caused by the framework processor incorrectly trying to treat your processing script (Python) as the framework bootstrap script (shell).
We're actively working on a solution, but some possible things I could suggest to try in the interim if you need:
- If your job is a single script and you don't need requirements.txt dependencies, maybe can try using the
ScriptProcessor
instead and explicitly passing in the PyTorch/TensorFlow container URI? - I wonder if adding a shebang comment something like
#!/usr/bin/python3
(not sure whether this is the actual installed location of Python in these containers) at the very top of your script file could persuade bash to run the script through the Python interpreter instead?
UPDATE 1: Adding shebang does not currently force ProcessingStep
into using Python3 in the command
.
UPDATE 2: ScriptProcessor
does work, however there is no support for source_dir
parameter (as commented above by @athewsey ). If you need custom dependencies or for multi-files script, create your own custom container by extending the SM images for TF/PyTorch/HuggingFace/MXNet.
For those who need some sort of directions on how to change from FrameworkProcessor
to ScriptProcessor
, here is an example for TF2.3:
##### COMMENT THE TENSORFLOWPROCESSOR
# from sagemaker.tensorflow import TensorFlowProcessor
# tp = TensorFlowProcessor(
# framework_version='2.3',
# role = get_execution_role(),
# instance_count=1,
# instance_type='ml.m5.large',
# base_job_name='DSM-TF-Demo-Process',
# py_version='py37'
# )
##### AND REPLACE WITH
from sagemaker.image_uris import retrieve
from sagemaker.processing import ScriptProcessor
from sagemaker import get_execution_role
image_uri = retrieve(
framework='tensorflow',
region='eu-west-1',
version='2.3',
py_version='py37',
image_scope='training',
instance_type='ml.m5.xlarge'
)
sp = ScriptProcessor(
role=get_execution_role(),
image_uri=image_uri,
command=['python3'],
instance_count=1,
instance_type='ml.m5.xlarge'
)
# Now, either run sp.run() or create a sagemaker.workflow.steps.ProcessingStep() , as needed
A very short example of a Dockerfile
to extend the default TF container and install dependencies (not tested yet):
FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.3-cpu-py37
COPY requirements.txt /opt/ml/processing/input/code/requirements.txt
RUN pip install -r /opt/ml/processing/input/code/requirements.txt
Any updates on this ?
Is there an ETA on this fix?
Is this issue fixed?
This is still not fixed as of today (May 10th 2022).
Any update on this issue? Facing the same problem.
Still the case for now.
However, there is now a possibility to use the new sagemaker.workflow.pipeline_context.PipelineSession
to have the .run()
generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a custom FrameworkProcessor
, but should work with any FrameworkProcessor
. Your code would look like:
from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession
session = PipelineSession()
skpv2 = FrameworkProcessor(
estimator_cls=SKLearn,
framework_version='0.23-1',
role = get_execution_role(),
instance_count=1,
instance_type='ml.m5.large',
sagemaker_session = session
)
step_args = skpv2.run(
code='processing.py',
source_dir="code", # add processing.py and requirements.txt here
inputs=[...], outputs=[...]
)
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep
processing_step = ProcessingStep(
name="MyProcessingStep",
step_args=step_args
)
# [ define the other steps if any ]
pipeline = Pipeline(steps=[...])
Just make sure to update the SageMaker Python SDK to the latest version :)
Thanks @dgallitelli
We would encourage users to adopt this new way to construct TrainingStep
, ProcessingStep
, TransformStep
, TuningStep
, and ModelStep
.
We have a readthedocs
about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.
The FrameworkProcessor
has a method called get_run_args
(doc here) that is designed to help integrate this processor to the ProcessingStep
, which can be put within a SageMaker pipeline. If you want to add pip dependencies, you can add a requirements.txt file under BASE_DIR
.
Here is a simplified code that helps to connect the dots between: FrameworkProcessor
, get_run_args
, ProcessingStep
and Pipeline
.
from sagemaker.processing import (
ProcessingInput,
ProcessingOutput,
FrameworkProcessor
)
from sagemaker.workflow.functions import Join
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import (
ProcessingStep
)
from sagemaker.tensorflow import TensorFlow
BASE_DIR = os.path.dirname(os.path.realpath(__file__))
preprocessing_processor = FrameworkProcessor(
estimator_cls=TensorFlow,
framework_version='2.4.3',
role=role,
instance_count=1,
instance_type="ml.m5.xlarge",
py_version='py37',
command=["python3"],
base_job_name="some-preprocessing-step"
)
train_data_in_s3 = ProcessingOutput(
source="/opt/ml/processing/output/train/",
destination=Join(
on="/",
values=[
"s3:/",
data_s3_bucket,
os.environ["SAGEMAKER_PROJECT_NAME"],
data_s3_key,
'train/'
],
),
output_name='train',
s3_upload_mode='Continuous',
)
test_data_in_s3 = ProcessingOutput(
source="/opt/ml/processing/output/test/",
destination=Join(
on="/",
values=[
"s3:/",
data_s3_bucket,
os.environ["SAGEMAKER_PROJECT_NAME"],
data_s3_key,
'test/'
],
),
output_name='test',
s3_upload_mode='Continuous',
)
data_s3_key_in_project = Join(
on="/",
values=[
os.environ["SAGEMAKER_PROJECT_NAME"],
data_s3_key
],
)
preprocessing_run_args = preprocessing_processor.get_run_args(
code="preprocess.py",
source_dir=BASE_DIR,
inputs=[],
outputs=[train_data_in_s3, test_data_in_s3],
arguments=[
'--data-s3-bucket', "your bucket name",
'--data-s3-key', "your key"
]
)
preprocessing_step = ProcessingStep(
name="your-preprocessing-step-name",
processor=preprocessing_processor,
inputs=preprocessing_run_args.inputs,
outputs=preprocessing_run_args.outputs,
job_arguments=preprocessing_run_args.arguments,
code=preprocessing_run_args.code
)
pipeline_name = "your-pipeline-name"
distributed_ml_training_pipeline = Pipeline(
name=pipeline_name,
parameters=[
# your pipeline parameters here
],
steps=[preprocessing_step, ...]
)
If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.
Is there any running example for a ProcessingStep with PyTorch that allows source_dir?
Thanks @dgallitelli
We would encourage users to adopt this new way to construct
TrainingStep
,ProcessingStep
,TransformStep
,TuningStep
, andModelStep
.We have a
readthedocs
about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.
Any update when this updated readthedocs
will be released?
@morfaer here we go https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html
I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step
Assuming that the Model
object in this example is sagemaker.model.Model
, the register
method returns a ModelPackage | None
type but the step_args
argument in the ModelStep
call expects an object of type _ModelStepArguments
. So type-wise this example looks fishy to me.
Also this example creates the following error for me
File "/home/.../run_pipeline.py", line 261, in model_registration_step
register_model_step_args = model.register(
File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
return run_func(*args, **kwargs)
File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register
model_package = self.sagemaker_session.create_model_package_from_containers(
AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'
So it seems like this example does not work.
I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?
I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step
Assuming that the
Model
object in this example issagemaker.model.Model
, theregister
method returns aModelPackage | None
type but thestep_args
argument in theModelStep
call expects an object of type_ModelStepArguments
. So type-wise this example looks fishy to me.Also this example creates the following error for me
File "/home/.../run_pipeline.py", line 261, in model_registration_step register_model_step_args = model.register( File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper return run_func(*args, **kwargs) File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register model_package = self.sagemaker_session.create_model_package_from_containers( AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'
So it seems like this example does not work.
I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?
can you please confirm sagemaker.workflow.pipeline_context.PipelineSession()
is used? more specifically,
from sagemaker.workflow.pipeline_context import PipelineSession
pipeline_session = PipelineSession()
....
model = Model(
image_uri=pytorch_estimator.training_image_uri(),
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
**sagemaker_session=pipeline_session**,
role=role,
)
register_args = model.register(...)
If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.
Should MANIFEST.in be located in source_dir or somewhere else?
If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.
Should MANIFEST.in be located in source_dir or somewhere else?
It should be at the same level as the setup.py.
@jerrypeng7773 The additional pipeline_session
parameter worked for. Thanks. What's weird though is that the ModelStep
appends a RegisterModel
suffix to my pipeline step in Sagemaker Studio.
I guess the reason is the _append_register_model_step
method in model_step.py
in the workflow
module.
The other step types don't seem to add such a suffix so this looks inconsistent to me. Is there a specific reason to do this for ModelStep
?
Hi @morfaer, both ModelStep
and RegisterModel
are StepCollection
. StepCollection
is not an actual step/graph node that shows up in the pipeline graph, instead it is a collection of steps/graph nodes.
Both ModelStep
and RegisterModel
can contain the following list of steps:
-
_RepackModelStep
(optional): this step is a TrainingStep under the hood, which do model repacking in execution time, if some specific conditions are met. The model repacking is to repack model artifacts with custom inference entry points and generate the new repacked model artifacts for registry. -
_RegisterModelStep
: it is the step which is doing the model registry in the execution.
What's weird though is that the ModelStep appends a RegisterModel suffix to my pipeline step
Compared with the RegisterModel
, ModelStep
explicitly appends the -RegisterModel
suffix, so that its sub step names look like:
- "
-RepackModel- " for _RepackModelStep
- "
-RegisterModel" for _RegisterModelStep
This can give users a clear hint that what each sub step is doing.
In addition, we recently pushed this PR: https://github.com/aws/sagemaker-python-sdk/pull/3240 to apply this naming convention of ModelStep
to RegisterModel
as well because RegisterModel
's previous naming manner can cause some issues.
Still the case for now.
However, there is now a possibility to use the new
sagemaker.workflow.pipeline_context.PipelineSession
to have the.run()
generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a customFrameworkProcessor
, but should work with anyFrameworkProcessor
. Your code would look like:from sagemaker.sklearn import SKLearn, SKLearnProcessor from sagemaker.processing import FrameworkProcessor # or change with any other FrameworkProcessor like HuggingFaceProcessor from sagemaker.workflow.pipeline_context import PipelineSession session = PipelineSession() skpv2 = FrameworkProcessor( estimator_cls=SKLearn, framework_version='0.23-1', role = get_execution_role(), instance_count=1, instance_type='ml.m5.large', sagemaker_session = session ) step_args = skpv2.run( code='processing.py', source_dir="code", # add processing.py and requirements.txt here inputs=[...], outputs=[...] ) from sagemaker.workflow.pipeline import Pipeline from sagemaker.workflow.steps import ProcessingStep processing_step = ProcessingStep( name="MyProcessingStep", step_args=step_args ) # [ define the other steps if any ] pipeline = Pipeline(steps=[...])
Just make sure to update the SageMaker Python SDK to the latest version :)
This method works... it creates a processor instance which the ProcessingStep accepts in the step_args parameter.
The method proposed by mohamed-ali did not work for me... it creates a list of arguments for the job-arguments parameter.
It may be worth noting here that each method works on a different parameter. When using the step_args parameter, you cannot use the processor argument... because the processor instance supplies the processor for the ProcessingStep.
When using the job-arguments parameter, there were no conflicts with the processor parameter, but the job still failed citing the missing 'code' without any reference to the missing 'source_dir'... this means it may have solved the source directory issue, but there was an issue retrieving the .py file from it.
From this experience, it appears that using the step-args method via the PipelineSession is a good idea.
Is there any update on this issue, has it been fixed? Currently running into the same problem
Is there any update on this issue, has it been fixed? Currently running into the same problem
Yes, using a PipelineSession() will work... follow the example from dgallitelli posted 5/16.