sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Exception with ParameterString in PySparkProcessor.run() Method

Open dipanjank opened this issue 3 years ago • 9 comments

Describe the bug If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).

According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?

To reproduce A clear, step-by-step set of instructions to reproduce the bug.


    spark_processor = PySparkProcessor(
        base_job_name="sagemaker-spark",
        framework_version="3.1",
        role=role,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session,
        max_runtime_in_seconds=1200,
    )

    spark_processor.run(
        submit_app="spark_processing/preprocess.py",
        arguments=[
            "--s3_input_bucket",
            ParameterString(name="s3-input-bucket", default_value=bucket),
            "--s3_input_key_prefix",
            input_prefix_abalone,
            "--s3_output_bucket",
            bucket,
            "--s3_output_key_prefix",
            input_preprocessed_prefix_abalone,
        ],
    )

Expected behavior A clear and concise description of what you expected to happen.

Expect a SageMaker ProcessingJob to be created.

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
    run_sagemaker_spark_job(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
    spark_processor.run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
    return run_func(*args, **kwargs)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
    self.latest_job = ProcessingJob.start_new(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
    return create(request)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.112.2
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySpark
  • Framework version: 3.1
  • Python version: default
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

dipanjank avatar Oct 19 '22 12:10 dipanjank

Any update on this issue? Getting the same problem when using any ScriptProcessor.

Only work around is to go back to a loaded ProcessingStep() which has now been marked as deprecated.

OwenAshton avatar Feb 01 '24 11:02 OwenAshton

Hi @martinRenou, This is causing some pretty big issues for us at the moment. Do you have any helpful updates on this please?

DavidRooney avatar Feb 02 '24 10:02 DavidRooney

I'm not working with the Sagemaker team at the moment, you may have better luck pinging people who work on this code-base these days.

martinRenou avatar Feb 02 '24 10:02 martinRenou

Thanks for getting back. I tagged you as it says you are assigned to it? Can you assign to someone on the team? There's 425 contributors so any help knowing who to link to this would be greatly appreciated. The best I can think of is to ping people who have done recent commits 🤷

DavidRooney avatar Feb 05 '24 10:02 DavidRooney

Friendly ping @knikure

martinRenou avatar Feb 05 '24 10:02 martinRenou

Any response at all? We would really like to continue using sagemaker but working around this issue is taking it's tole. @knikure

DavidRooney avatar Feb 22 '24 09:02 DavidRooney

@martinRenou Is there anyone else to friendly ping on this? knikure unassigned 👎

DavidRooney avatar May 28 '24 14:05 DavidRooney