aws-step-functions-data-science-sdk-python
aws-step-functions-data-science-sdk-python copied to clipboard
feat: Support placeholders for input_path and output_path for all States (except Fail) and items_path for MapState
Description
With this change, it will be possible to use:
- Placeholders for input_path and output_path for all States (except Fail)
- Placeholders for items_path for Map State
- Context Object Data for Map states
Fixes #101
Why is the change necessary?
This enables the capacity to define input_path and output_path values dynamically for all States (except Fail State). This also supports using placeholder for items_path and context object for MapState.
Solution
Support Placeholders for input_path, output_path and items_path
During workflow definition serialization, replace placeholder with json path when the parsed argument is one of the three (input_path, output_pat, items_path).
Support Context Object Data for Map State
Add new Placeholder objects MapItemValue and MapItemIndex with a json string template to use during workflow definition serialization.
Placeholder | Gets replaced by | json string template |
---|---|---|
MapItemValue | Value of the array item that is being processed in the current iteration | $$.Map.Item.Value{} |
MapItemIndex | Index number of the array item that is being processed in the current iteration | $$.Map.Item.Index |
Example
map_item_value = MapItemValue(schema={
'name': str,
'age': str
})
map_item_index = MapItemIndex()
map_state = Map(
'MapState01',
parameters={
"MapIndex": map_item_index,
"Name": map_item_value['name'],
"Age": map_item_value['age']
}
)
iterator_state = Pass(
'TrainIterator'
)
map_state.attach_iterator(iterator_state)
workflow_definition = Chain([map_state])
workflow = Workflow(
name="MapItemExample",
definition=workflow_definition,
role=workflow_execution_role
)
Workflow definition will be:
{
"StartAt": "MapState01",
"States": {
"MapState01": {
"Parameters": {
"MapIndex.$": "$$.Map.Item.Index",
"Name.$": "$$.Map.Item.Value['name']",
"Age.$": "$$.Map.Item.Value['age']"
},
"Type": "Map",
"End": true,
"Iterator": {
"StartAt": "TrainIterator",
"States": {
"TrainIterator": {
"Type": "Pass",
"End": true
}
}
}
}
}
}
Testing
- Added integ and unit tests
Pull Request Checklist
Please check all boxes (including N/A items)
Testing
- [X] Unit tests added
- [X] integration test added
Documentation
- [X] docs: All relevant docs updated
- [X] docstrings: All public APIs documented
Title and description
- [X] Change type: Title is prefixed with change type: and follows conventional commits
- [X] References: Indicate issues fixed via:
Fixes #xxx
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.
Clarified what "support placeholders for Map state" means in the initial issue comments: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833. So far, this PR only addresses the 3rd use case, which isn't actually what the requester was trying to do.
I think we should also update the docs related to placeholders. Might make sense to re-purpose the example in your commit body to illustrate its usage.
Agreed - will update the placeholder docs
does it make sense to add an integration test that exercises placeholders? as we expand the use cases and scenarios we support, I'm thinking it would be useful to have some basic integ tests as unit tests are more brittle.
Yes it does - will include one for Map state
@ca-nguyen This change doesn't just affect Map state. There's the 3 things I mentioned here: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833
InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.
InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.
Updated the PR title and description
AWS CodeBuild CI Report
- CodeBuild project: AutoBuildProject6AEA49D1-sEHrOdk7acJc
- Commit ID: b4ddbbdc20ee901d7e11d6afdd2c1ed0470cbc2b
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository