spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module

Open junyuc25 opened this issue 2 years ago • 3 comments

What changes were proposed in this pull request?

As Spark is moving to 4.0, one of the major improvement is to upgrade AWS SDK to v2.

Currently other than directly using AWS SDKv1 codes, the Spark Kinesis connector is also using on these libraries that depends on SDKv1:

  • Kinesis Client Library (KCL) allows users to easily consume and process data from Amazon Kinesis
  • Kinesis Producer Library (KPL) allows users to create reliable and efficient message producers for Amazon Kinesis

The main purpose of this PR is to upgrading AWS SDK to v2 for the Spark Kinesis conector. While the changes includes upgrading AWS SDK and KCL to v2, we will not upgrade KPL because it has not yet been migrated to SDKv2.

  • Parent Jira: parent Jira: https://issues.apache.org/jira/browse/SPARK-44124.
  • Previous PR to setup Kinesis tests in Github Actions: https://github.com/apache/spark/pull/43736
  • Previous stale PR: https://github.com/apache/spark/pull/42581

Why are the changes needed?

As the GA of AWS SDK v2, the SDKv1 has entered maintenance mode where its future release are only limited to address critical bug and security issues. More details about the SDK maintenance policy can be found here. To keep Spark’s dependent softwares up to date, we should consider upgrading the SDK to v2. These changes could keep Spark Kinesis connector up to date, and enable users to receive continuous support from the above libraries.

Does this PR introduce any user-facing change?

Yes. With this change, the Spark Kinesis connector will no longer work with SDKv1. Any applications that are running with previous version of Spark Kinesis connector would require update before migrating to Spark 4.0.

AWS SDKv2 and KCLv2 contain several major changes that are not backward compatible with their previous versions. And some public classes in the module (i.e. KinesisInputDStream) are using one of these breaking changes. Thus these user-facing classes require updates as well.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No

junyuc25 avatar Dec 06 '23 13:12 junyuc25

Anyway, thank you so much for working on this area, @junyuc25 .

dongjoon-hyun avatar Dec 06 '23 20:12 dongjoon-hyun

@junyuc25 why you close this PR? And you should remove the [WIP] in the title when your PR is ready for review, or committers cannot know when could start to review.

LantaoJin avatar Jan 22 '24 11:01 LantaoJin

@junyuc25 why you close this PR? And you should remove the [WIP] in the title when your PR is ready for review, or committers cannot know when could start to review.

Looks like I deleted the branch by accident. Updated the title and reopened the PR.

junyuc25 avatar Jan 23 '24 10:01 junyuc25