aws-step-functions-data-science-sdk-python icon indicating copy to clipboard operation
aws-step-functions-data-science-sdk-python copied to clipboard

v2 Release plans and migration instructions

Open yoodan93 opened this issue 3 years ago • 1 comments

Release of StepFunctions Python SDK Version 2 and support timeline for Version 1

With Python 2 having reached End of Life on January 1, 2020 and the release of SageMaker Python SDK V2, we will be releasing V2 of the AWS Step Functions Data Science SDK. This issue describes the changes, release timeline for V2, and end-of-life support for V1.

This major version bump will include the following breaking changes:

  • Deprecate Python 2 support for the StepFunctions Python SDK
  • Upgrade sagemaker dependency from 1.x to 2.x

If you would like to try a V2 pre-release candidate of the SDK today, you can install the 2.0.0rc1 pre-release candidate for V2 from PyPI. You can find the instructions for migration from v1 to v2 of the Step Functions Data Science SDK below.

Timeline

We are targeting the official release of V2 for February 2021. The 2.0.0-rc1 pre-release was made available on September 23, 2020.

Timeframe Milestone
Done Upgrade to sagemaker v2 https://github.com/aws/aws-step-functions-data-science-sdk-python/pull/76
Done Remove Python2 support from v2 https://github.com/aws/aws-step-functions-data-science-sdk-python/pull/91
Done Release pre-release candidate to PyPI
Done Create new branch for v1. The master branch will be used for v2
Done Set up build and release automation for v2 and v1
Done Update documentation for v2
Jan 2021 Add deprecation warnings in v1
Done Update Jupyter notebook examples to v2
Feb 2021 last v1 release
Done v2.0.0 release

Support timeline for V1

The last v1 release will be made in February 2021. After the February release, updates to v1 will be limited to critical bug fixes until August 2021.

Migration Instructions

Prerequisites:

  • Install Python 3
  • Install aws-step-functions-data-science-sdk Version 2.x

Sagemaker SDK:

If your project is using the Sagemaker Python SDK, it must be upgraded to version 2.x.

Here is the official documentation for upgrading to Sagemaker Python SDK Version 2.x. https://sagemaker.readthedocs.io/en/stable/v2.html#breaking-changes

StepFunctions SDK:

Breaking changes were introduced to the interfaces of the following classes:

TrainingStep and TuningStep (https://github.com/aws/aws-step-functions-data-science-sdk-python/blob/master/src/stepfunctions/steps/sagemaker.py#L36-L50)

The “data” parameter for TrainingStep and TuningStep classes was changed from:

data: Information about the training data. Please refer to the ``fit()`` method of the associated estimator, as this can take any of the following forms:
    * (str) - The S3 location where training data is saved.
    * (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
        channels for training data, you can specify a dict mapping channel names to
        strings or :func:`~sagemaker.inputs.TrainingInput` objects.
    * (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
        provide additional information about the training dataset. See
        :func:`sagemaker.session.s3_input` for full details.
    * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
        Amazon :class:`Record` objects serialized and stored in S3.
        For use with an estimator for an Amazon algorithm.
    * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
        :class:`sagemaker.amazon.amazon_estimator.RecordSet` objects,
        where each instance is a different channel of training data.

to

data: Information about the training data. Please refer to the ``fit()`` method of the associated estimator, as this can take any of the following forms:
    * (str) - The S3 location where training data is saved.
    * (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
        channels for training data, you can specify a dict mapping channel names to
        strings or :func:`~sagemaker.inputs.TrainingInput` objects.
    * (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can
        provide additional information about the training dataset. See
        :func:`sagemaker.inputs.TrainingInput` for full details.
    * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
        Amazon :class:`Record` objects serialized and stored in S3.
        For use with an estimator for an Amazon algorithm.
    * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
        :class:`sagemaker.amazon.amazon_estimator.RecordSet` objects,
        where each instance is a different channel of training data.

(sagemaker.session.s3_input) has been renamed to (sagemaker.inputs.TrainingInput) in Sagemaker Python SDK 2.x

TrainingPipeline and InferencePipeline (https://github.com/aws/aws-step-functions-data-science-sdk-python/blob/master/src/stepfunctions/template/pipeline/train.py#L43-L49)

The “input” parameter for TrainingPipeline and InferencePipeline classes was changed from:

inputs: Information about the training data. Please refer to the `fit()` method of the associated estimator, as this can take any of the following forms:
    * (str) - The S3 location where training data is saved.
    * (dict[str, str] or dict[str, `sagemaker.inputs.TrainingInput`]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `sagemaker.inputs.TrainingInput` objects.
    * (`sagemaker.session.s3_input`) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.session.s3_input` for full details.
    * (`sagemaker.amazon.amazon_estimator.RecordSet`) - A collection of Amazon `Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    * (list[`sagemaker.amazon.amazon_estimator.RecordSet`]) - A list of `sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.

to

inputs: Information about the training data. Please refer to the `fit()` method of the associated estimator, as this can take any of the following forms:
    * (str) - The S3 location where training data is saved.
    * (dict[str, str] or dict[str, `sagemaker.inputs.TrainingInput`]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `sagemaker.inputs.TrainingInput` objects.
    * (`sagemaker.inputs.TrainingInput`) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.inputs.TrainingInput` for full details.
    * (`sagemaker.amazon.amazon_estimator.RecordSet`) - A collection of Amazon `Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    * (list[`sagemaker.amazon.amazon_estimator.RecordSet`]) - A list of `sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.

(sagemaker.session.s3_input) has been renamed to (sagemaker.inputs.TrainingInput) in Sagemaker Python SDK 2.x

yoodan93 avatar Dec 19 '20 01:12 yoodan93

Really keen to see SageMaker SDKv2 support soon as it's now quite rare to find samples using SDKv1!

athewsey avatar Jan 08 '21 08:01 athewsey