aws-step-functions-data-science-sdk-python
aws-step-functions-data-science-sdk-python copied to clipboard
v2 Release plans and migration instructions
Release of StepFunctions Python SDK Version 2 and support timeline for Version 1
With Python 2 having reached End of Life on January 1, 2020 and the release of SageMaker Python SDK V2, we will be releasing V2 of the AWS Step Functions Data Science SDK. This issue describes the changes, release timeline for V2, and end-of-life support for V1.
This major version bump will include the following breaking changes:
- Deprecate Python 2 support for the StepFunctions Python SDK
- Upgrade sagemaker dependency from 1.x to 2.x
If you would like to try a V2 pre-release candidate of the SDK today, you can install the 2.0.0rc1 pre-release candidate for V2 from PyPI. You can find the instructions for migration from v1 to v2 of the Step Functions Data Science SDK below.
Timeline
We are targeting the official release of V2 for February 2021. The 2.0.0-rc1 pre-release was made available on September 23, 2020.
Timeframe | Milestone |
---|---|
Done | Upgrade to sagemaker v2 https://github.com/aws/aws-step-functions-data-science-sdk-python/pull/76 |
Done | Remove Python2 support from v2 https://github.com/aws/aws-step-functions-data-science-sdk-python/pull/91 |
Done | Release pre-release candidate to PyPI |
Done | Create new branch for v1. The master branch will be used for v2 |
Done | Set up build and release automation for v2 and v1 |
Done | Update documentation for v2 |
Jan 2021 | Add deprecation warnings in v1 |
Done | Update Jupyter notebook examples to v2 |
Feb 2021 | last v1 release |
Done | v2.0.0 release |
Support timeline for V1
The last v1 release will be made in February 2021. After the February release, updates to v1 will be limited to critical bug fixes until August 2021.
Migration Instructions
Prerequisites:
- Install Python 3
- Install aws-step-functions-data-science-sdk Version 2.x
Sagemaker SDK:
If your project is using the Sagemaker Python SDK, it must be upgraded to version 2.x.
Here is the official documentation for upgrading to Sagemaker Python SDK Version 2.x. https://sagemaker.readthedocs.io/en/stable/v2.html#breaking-changes
StepFunctions SDK:
Breaking changes were introduced to the interfaces of the following classes:
TrainingStep and TuningStep (https://github.com/aws/aws-step-functions-data-science-sdk-python/blob/master/src/stepfunctions/steps/sagemaker.py#L36-L50)
The “data” parameter for TrainingStep and TuningStep classes was changed from:
data: Information about the training data. Please refer to the ``fit()`` method of the associated estimator, as this can take any of the following forms:
* (str) - The S3 location where training data is saved.
* (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
channels for training data, you can specify a dict mapping channel names to
strings or :func:`~sagemaker.inputs.TrainingInput` objects.
* (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
provide additional information about the training dataset. See
:func:`sagemaker.session.s3_input` for full details.
* (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:`Record` objects serialized and stored in S3.
For use with an estimator for an Amazon algorithm.
* (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:`sagemaker.amazon.amazon_estimator.RecordSet` objects,
where each instance is a different channel of training data.
to
data: Information about the training data. Please refer to the ``fit()`` method of the associated estimator, as this can take any of the following forms:
* (str) - The S3 location where training data is saved.
* (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
channels for training data, you can specify a dict mapping channel names to
strings or :func:`~sagemaker.inputs.TrainingInput` objects.
* (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can
provide additional information about the training dataset. See
:func:`sagemaker.inputs.TrainingInput` for full details.
* (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:`Record` objects serialized and stored in S3.
For use with an estimator for an Amazon algorithm.
* (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:`sagemaker.amazon.amazon_estimator.RecordSet` objects,
where each instance is a different channel of training data.
(sagemaker.session.s3_input) has been renamed to (sagemaker.inputs.TrainingInput) in Sagemaker Python SDK 2.x
TrainingPipeline and InferencePipeline (https://github.com/aws/aws-step-functions-data-science-sdk-python/blob/master/src/stepfunctions/template/pipeline/train.py#L43-L49)
The “input” parameter for TrainingPipeline and InferencePipeline classes was changed from:
inputs: Information about the training data. Please refer to the `fit()` method of the associated estimator, as this can take any of the following forms:
* (str) - The S3 location where training data is saved.
* (dict[str, str] or dict[str, `sagemaker.inputs.TrainingInput`]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `sagemaker.inputs.TrainingInput` objects.
* (`sagemaker.session.s3_input`) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.session.s3_input` for full details.
* (`sagemaker.amazon.amazon_estimator.RecordSet`) - A collection of Amazon `Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
* (list[`sagemaker.amazon.amazon_estimator.RecordSet`]) - A list of `sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
to
inputs: Information about the training data. Please refer to the `fit()` method of the associated estimator, as this can take any of the following forms:
* (str) - The S3 location where training data is saved.
* (dict[str, str] or dict[str, `sagemaker.inputs.TrainingInput`]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `sagemaker.inputs.TrainingInput` objects.
* (`sagemaker.inputs.TrainingInput`) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.inputs.TrainingInput` for full details.
* (`sagemaker.amazon.amazon_estimator.RecordSet`) - A collection of Amazon `Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
* (list[`sagemaker.amazon.amazon_estimator.RecordSet`]) - A list of `sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
(sagemaker.session.s3_input) has been renamed to (sagemaker.inputs.TrainingInput) in Sagemaker Python SDK 2.x
Really keen to see SageMaker SDKv2 support soon as it's now quite rare to find samples using SDKv1!