aws-step-functions-data-science-sdk-python icon indicating copy to clipboard operation
aws-step-functions-data-science-sdk-python copied to clipboard

feat: Support placeholders for input_path and output_path for all States (except Fail) and items_path for MapState

Open ca-nguyen opened this issue 3 years ago • 5 comments

Description

With this change, it will be possible to use:

  1. Placeholders for input_path and output_path for all States (except Fail)
  2. Placeholders for items_path for Map State
  3. Context Object Data for Map states

Fixes #101

Why is the change necessary?

This enables the capacity to define input_path and output_path values dynamically for all States (except Fail State). This also supports using placeholder for items_path and context object for MapState.

Solution

Support Placeholders for input_path, output_path and items_path

During workflow definition serialization, replace placeholder with json path when the parsed argument is one of the three (input_path, output_pat, items_path).

Support Context Object Data for Map State

Add new Placeholder objects MapItemValue and MapItemIndex with a json string template to use during workflow definition serialization.

Placeholder Gets replaced by json string template
MapItemValue Value of the array item that is being processed in the current iteration $$.Map.Item.Value{}
MapItemIndex Index number of the array item that is being processed in the current iteration $$.Map.Item.Index
Example
map_item_value = MapItemValue(schema={
        'name': str,
        'age': str
    })

map_item_index = MapItemIndex()

map_state = Map(
    'MapState01',
    parameters={
        "MapIndex": map_item_index,
        "Name": map_item_value['name'],
        "Age": map_item_value['age']
    }
)
iterator_state = Pass(
    'TrainIterator'
)

map_state.attach_iterator(iterator_state)
workflow_definition = Chain([map_state])

workflow = Workflow(
    name="MapItemExample",
    definition=workflow_definition,
    role=workflow_execution_role
)

Workflow definition will be:

{
    "StartAt": "MapState01",
    "States": {
        "MapState01": {
            "Parameters": {
                "MapIndex.$": "$$.Map.Item.Index",
                "Name.$": "$$.Map.Item.Value['name']",
                "Age.$": "$$.Map.Item.Value['age']"
            },
            "Type": "Map",
            "End": true,
            "Iterator": {
                "StartAt": "TrainIterator",
                "States": {
                    "TrainIterator": {
                        "Type": "Pass",
                        "End": true
                    }
                }
            }
        }
    }
}

Testing

  • Added integ and unit tests

Pull Request Checklist

Please check all boxes (including N/A items)

Testing

  • [X] Unit tests added
  • [X] integration test added

Documentation

  • [X] docs: All relevant docs updated
  • [X] docstrings: All public APIs documented

Title and description

  • [X] Change type: Title is prefixed with change type: and follows conventional commits
  • [X] References: Indicate issues fixed via: Fixes #xxx

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.

ca-nguyen avatar Sep 03 '21 19:09 ca-nguyen

Clarified what "support placeholders for Map state" means in the initial issue comments: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833. So far, this PR only addresses the 3rd use case, which isn't actually what the requester was trying to do.

wong-a avatar Sep 10 '21 16:09 wong-a

I think we should also update the docs related to placeholders. Might make sense to re-purpose the example in your commit body to illustrate its usage.

Agreed - will update the placeholder docs

does it make sense to add an integration test that exercises placeholders? as we expand the use cases and scenarios we support, I'm thinking it would be useful to have some basic integ tests as unit tests are more brittle.

Yes it does - will include one for Map state

ca-nguyen avatar Sep 10 '21 17:09 ca-nguyen

@ca-nguyen This change doesn't just affect Map state. There's the 3 things I mentioned here: https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/101#issuecomment-917040833

InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.

wong-a avatar Sep 10 '21 21:09 wong-a

InputPath and OutputPath are allowed in all state types except Fail. Please update the PR title, description, and tests accordingly.

Updated the PR title and description

ca-nguyen avatar Sep 11 '21 08:09 ca-nguyen

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-sEHrOdk7acJc
  • Commit ID: b4ddbbdc20ee901d7e11d6afdd2c1ed0470cbc2b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

StepFunctions-Bot avatar Nov 01 '21 20:11 StepFunctions-Bot