aws-step-functions-data-science-sdk-python
aws-step-functions-data-science-sdk-python copied to clipboard
Is it Possible to use schema from ExecutionInput into container_arguments of ProcessingStep?
Hi, Lets say I have a execution schema as follows:
execution_input = ExecutionInput(
schema={
"PATH_INPUT": str,
"DESTINATION_OUTPUT": str,
"study_name": str,
"ProcessingJobName": str,
"input_code": str,
"job_pk": str,
"job_sk": str,
}
)
How can I use the execution_input values in the Container Argument part bellow:
processing_step = steps.ProcessingStep(
"SageMakerProcessingJob1",
processor=get_processing_container_config(),
job_name=execution_input["ProcessingJobName"],
inputs=input_meta,
outputs=output_meta,
container_arguments=[
"--input_filename", "file.docx",
"--study_name", execution_input["study_name"]
],
container_entrypoint=["python3", "/opt/ml/processing/code/main.py"]
)
There the study name should come from the the execution input schema. But when trying to create the workflow graph it throughs following errors. Though in the jobname part it except the value from ExecutionInput
workflow_graph = steps.Chain([<over complicated steps>])
workflow = Workflow(
name="ProcessingJob3_v1",
definition=workflow_graph,
role=workflow_execution_role,
execution_input=execution_input
)
workflow.render_graph()
workflow_arn = workflow.create()
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-26-17fe64d66aa4> in <module>()
----> 1 workflow.render_graph()
2 workflow_arn = workflow.create()
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/site-packages/stepfunctions/workflow/stepfunctions.py in render_graph(self, portrait)
374 portrait (bool, optional): Boolean flag set to `True` if the workflow graph should be rendered in portrait orientation. Set to `False`, if the graph should be rendered in landscape orientation. (default: False)
375 """
--> 376 widget = WorkflowGraphWidget(self.definition.to_json())
377 return widget.show(portrait=portrait)
378
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/site-packages/stepfunctions/steps/states.py in to_json(self, pretty)
91 return json.dumps(self.to_dict(), indent=4)
92
---> 93 return json.dumps(self.to_dict())
94
95 def __repr__(self):
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in encode(self, o)
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in default(self, o)
178 """
179 raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180 o.__class__.__name__)
181
182 def encode(self, o):
TypeError: Object of type 'ExecutionInput' is not JSON serializable
Hi @DataPsycho!
Currently, the only way to use Placeholders with container_arguments
is to define the container arguments entirely as a Placeholder.
Something like this:
execution_input = ExecutionInput(
schema={
"PATH_INPUT": str,
"DESTINATION_OUTPUT": str,
"container_arguments": list,
"ProcessingJobName": str,
"input_code": str,
"job_pk": str,
"job_sk": str,
}
)
processing_step = steps.ProcessingStep(
"SageMakerProcessingJob1",
processor=get_processing_container_config(),
job_name=execution_input["ProcessingJobName"],
inputs=input_meta,
outputs=output_meta,
container_arguments=execution_input["container_arguments"],
container_entrypoint=["python3", "/opt/ml/processing/code/main.py"]
)
Being able to use Placeholder values for the individual arguments within the container_arguments
would be a great enhancement to add and I can imagine many use cases for this. Thank you for bringing this to our attention! Tagging this as an enhancement and putting it on our radar.
Hope this helps!
Hi! Im very glad that @DataPsycho raised this issue. I was wondering the same, since I found weird that individual arguments couldn't be specified nor directly referencing the execution input or with the step functions json referencing.
I've tried both (the first similar to the original issue and the json referncing) :
1)
container_arguments=["--metrics-type", execution_input['MetricsType'], "--metrics-name", execution_input['MetricsName'], "--label-name",execution_input['LabelName']]
container_arguments=["--metrics-type", "$$.Execution.Input['MetricsType']", "--metrics-name", "$$.Execution.Input['MetricsName']", "--label-name","$$.Execution.Input['LabelName']"]
It would be really nice to count with this enhancement! It expands and makes step creation more flexible.
Thank you for showing interest in this feature @Liks96! We are keeping this on our radar
Hi, Thanks for considering feature. Looking forward to use it when it is available. Thanks
I got here after having the same issue. I confirm it would be really useful having this flexibility!
I have the same issue, it would be wounderfull to have execution inputs as container arguements