delta icon indicating copy to clipboard operation
delta copied to clipboard

[Spark] Managed Commits: add a DynamoDB-based commit owner

Open dhruvarya-db opened this issue 9 months ago • 0 comments

Which Delta project/connector is this regarding?

  • [X] Spark
  • [ ] Standalone
  • [ ] Flink
  • [ ] Kernel
  • [ ] Other (fill in here)

Description

Taking inspiration from https://github.com/delta-io/delta/pull/339, this PR adds a Commit Owner Client which uses DynamoDB as the backend. Each Delta table managed by a DynamoDB instance will have one corresponding entry in a DynamoDB table. The table schema is as follows:

  • tableId: String --- The unique identifier for the entry. This is a UUID.
  • path: String --- The fully qualified path of the table in the file system. e.g. s3://bucket/path.
  • acceptingCommits: Boolean --- Whether the commit owner is accepting new commits. This will only
  • be set to false when the table is converted from managed commits to file system commits.
  • tableVersion: Number --- The version of the latest commit.
  • tableTimestamp: Number --- The inCommitTimestamp of the latest commit.
  • schemaVersion: Number --- The version of the schema used to store the data.
  • commits: --- The list of unbackfilled commits.
    • version: Number --- The version of the commit.
    • inCommitTimestamp: Number --- The inCommitTimestamp of the commit.
    • fsName: String --- The name of the unbackfilled file.
    • fsLength: Number --- The length of the unbackfilled file.
    • fsTimestamp: Number --- The modification time of the unbackfilled file.

For a table to be managed by DynamoDB, registerTable must be called for that Delta table. This will create a new entry in the db for this Delta table. Every commit invocation appends the UUID delta file status to the commits list in the table entry. commit is performed through a conditional write in DynamoDB.

How was this patch tested?

Added a new suite called DynamoDBCommitOwnerClient5BackfillSuite which uses a mock DynamoDB client. + plus manual testing against a DynamoDB instance.

Does this PR introduce any user-facing changes?

dhruvarya-db avatar May 16 '24 21:05 dhruvarya-db