Upgrade AWS SDK to V2
Which Delta project/connector is this regarding?
- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [x] Storage
- [x] storageS3DynamoDB
Description
The AWS SDK for Java 1.x is being deprecated will enter maintenance mode on July 31, 2024. The end-of-support is effective December 31, 2025. To address the package deprecation, we’ll need to upgrade AWS SDK Java 1.x in delta to AWS SDK Java 2.x. SDK v2 is a major rewrite of the version 1.x code base. For detailed differences, please refer to What's different between the AWS SDK for Java 1.x and 2.x.
List of files in delta main branch that are currently leveraging AWS SDK v1 APIs: https://github.com/search?q=repo%3Adelta-io%2Fdelta%20com.amazonaws&type=code. These are files that we need to update for this upgrade.
Note: Part of this patch is based upon another open PR: https://github.com/delta-io/delta/pull/2408/files.
How was this patch tested?
Unit Test
build/sbt storageS3DynamoDB/test: passingbuild/sbt storage/test: passing
Integration Test
run-integration-tests.py --s3-log-store-util-only
[info] - setup empty delta log
[info] - empty
[info] - small
[info] - medium
[info] - large
[info] S3LogStoreUtilTest:
[info] Run completed in 22 seconds, 503 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 24 s, completed Apr 23, 2024, 9:36:04 AM
Manual Testing
spark-sql \
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName=delta_log1 \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-east-1 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--jars /usr/share/aws/delta/lib/delta-storage-s3-dynamodb.jar
CREATE TABLE my_delta_table_1 (
id INT,
value INT
) USING delta;
INSERT INTO my_delta_table_1
VALUES
(1, 100),
(2, 200),
(3, 300),
(4, 400),
(5, 500),
(6, 600),
(7, 700),
(8, 800),
(9, 900),
(10, 1000);
select * from my_delta_table_1;
6 600
7 700
3 300
4 400
5 500
6 600
7 700
8 800
9 900
10 1000
3 300
4 400
5 500
8 800
9 900
10 1000
1 100
2 200
1 100
2 200
Time taken: 1.175 seconds, Fetched 20 row(s)
Does this PR introduce any user-facing changes?
Yes, users will need to specify the SDK V2 credential provider instead of SDK V1 for delta storage configurations
Ex: io.delta.storage.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider -> software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider