sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Pickling a SageMaker step corrupts part of the object.
Describe the bug This issue is related to https://github.com/boto/boto3/issues/3365. I have been trying to pickle and unpickle a SageMaker step and the relevant part for this repo is the fact that after unpickling a step the step.step_type becomes CONDITION from what it was originally:
Original:
ProcessingStep(name='calls', display_name=None, description='Processing missing calldata.', step_type=<StepTypeEnum.PROCESSING: 'Processing'>, depends_on=None)
Becomes:
ProcessingStep(name='calls', display_name=None, description='Processing missing calldata.', step_type=<StepTypeEnum.CONDITION: 'Condition'>, depends_on=None)
To reproduce I cannot give the complete code for the ProcessingStep but I think any initialization would produce the bug.
import dill
from dill import loads, dumps
import copyreg
def save_sslcontext(obj):
return obj.__class__, (obj.protocol,)
copyreg.pickle(ssl.SSLSocket, _reduce_socket, _rebuild_socket)
copyreg.pickle(ssl.SSLContext, save_sslcontext)
# Creating a ProcessingStep. I cannot share the script but I imagine any processing step will have the same behaviour.
step= ProcessingStep(
name='calls',
description='Processing missing calldata.',
processor=script_processor,
cache_config=cache_config,
code=(Path(__file__).parent / 'scripts' / 'calls.py').as_posix(),
job_arguments=[
'--catalog_variables',
json.dumps(catalog_variables),
],
)
print(step)
print(loads(dumps(step)
Expected behavior I expect once unpickled the object to not have corrupted fields. the step.step_type should not change.
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.98.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): ProcessingStep, TuningStep, probably all of the other ones.
- Framework version: na
- Python version: 3.10
- CPU or GPU: CPU
- Custom Docker image (Y/N): Either
Additional context To pickle a SageMaker step sometimes you have to add the following code as well. This is due to botocore and related to the https://github.com/boto/boto3/issues/3365 issue.
# Stuff I changed to make `dill `work.
def __getattr__client(self, item):
if item not in vars(self):
raise AttributeError
event_name = "getattr.%s.%s" % (self._service_model.service_id.hyphenize(), item)
handler, event_response = self.meta.events.emit_until_response(
event_name, client=self
)
if event_response is not None:
return event_response
raise AttributeError(
"'%s' object has no attribute '%s'" % (self.__class__.__name__, item)
)
botocore.client.BaseClient.__getattr__ = __getattr__client
def __getattr__errorfactory(self, name):
if name not in vars(self):
raise AttributeError
exception_cls_names = [
exception_cls.__name__ for exception_cls in self._code_to_exception.values()
]
raise AttributeError(
"%r object has no attribute %r. Valid exceptions are: %s"
% (self, name, ", ".join(exception_cls_names))
)
botocore.errorfactory.BaseClientExceptions.__getattr__ = __getattr__errorfactory
def save_sslcontext(obj):
return obj.__class__, (obj.protocol,)
copyreg.pickle(ssl.SSLSocket, _reduce_socket, _rebuild_socket)
copyreg.pickle(ssl.SSLContext, save_sslcontext)
There is a bug in this metaclass for the StepTypeEnum: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/entities.py#L41
Basically, StepTypeEnum("any string value")
always returns the StepTypeEnum.CONDITION
. It is not expected behavior.
Could you elaborate on your use case? Why would you like to pickle/unpickle the pipeline objects? If you like to share the pipeline definition with someone else, you either share the source code or do pipeline.definition()
to compile the pipeline into a Json and share.
Even though we fix the StepTypeEnum bug, I still see a couple of problems with pickling the pipeline objects. For example, dill couldn't handle Enum object: https://github.com/uqfoundation/dill/issues/250
Closing this issue as the underlying StepTypeEnum bug has been fixed, and other questions have been abandoned. Feel free to re-open.