Exception with ParameterString in PySparkProcessor.run() Method
Describe the bug
If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).
According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?
To reproduce A clear, step-by-step set of instructions to reproduce the bug.
spark_processor = PySparkProcessor(
base_job_name="sagemaker-spark",
framework_version="3.1",
role=role,
instance_count=2,
instance_type="ml.m5.xlarge",
sagemaker_session=sagemaker_session,
max_runtime_in_seconds=1200,
)
spark_processor.run(
submit_app="spark_processing/preprocess.py",
arguments=[
"--s3_input_bucket",
ParameterString(name="s3-input-bucket", default_value=bucket),
"--s3_input_key_prefix",
input_prefix_abalone,
"--s3_output_bucket",
bucket,
"--s3_output_key_prefix",
input_preprocessed_prefix_abalone,
],
)
Expected behavior A clear and concise description of what you expected to happen.
Expect a SageMaker ProcessingJob to be created.
Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
Traceback (most recent call last):
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
run_sagemaker_spark_job(
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
spark_processor.run(
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
return super().run(
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
return super().run(
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
return run_func(*args, **kwargs)
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
self.latest_job = ProcessingJob.start_new(
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
processor.sagemaker_session.process(**process_args)
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
self._intercept_create_request(process_request, submit, self.process.__name__)
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
return create(request)
File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
LOGGER.debug("process request: %s", json.dumps(request, indent=4))
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.112.2
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySpark
- Framework version: 3.1
- Python version: default
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context Add any other context about the problem here.
Any update on this issue? Getting the same problem when using any ScriptProcessor.
Only work around is to go back to a loaded ProcessingStep() which has now been marked as deprecated.
Hi @martinRenou, This is causing some pretty big issues for us at the moment. Do you have any helpful updates on this please?
I'm not working with the Sagemaker team at the moment, you may have better luck pinging people who work on this code-base these days.
Thanks for getting back. I tagged you as it says you are assigned to it? Can you assign to someone on the team? There's 425 contributors so any help knowing who to link to this would be greatly appreciated. The best I can think of is to ping people who have done recent commits 🤷
Friendly ping @knikure
Any response at all? We would really like to continue using sagemaker but working around this issue is taking it's tole. @knikure
@martinRenou Is there anyone else to friendly ping on this? knikure unassigned 👎