flink-connector-aws icon indicating copy to clipboard operation
flink-connector-aws copied to clipboard

[FLINK-35500][Connectors/DynamoDB] DynamoDb Table API Sink fails to delete elements due to key not found

Open robg-eb opened this issue 1 year ago • 7 comments

Purpose of the change

When DynamoDbSink is used with CDC sources, it fails to process DELETE records and throws

    org.apache.flink.connector.dynamodb.shaded.software.amazon.awssdk.services.dynamodb.model.DynamoDbException: The provided key element does not match the schema

This is due to DynamoDbSinkWriter passing the whole DynamoDb Item as key instead of the constructed primary key alone.

Verifying this change

This change added tests and can be verified as follows:

  • Added tests to RowDataToAttributeValueConverterTest.java:

    • testDeleteOnlyPrimaryKey - Ensures that for a DELETE request, only the (single) PK field is included
    • testDeleteOnlyPrimaryKeys - Ensures that for a DELETE request with a composite PK, both PK fields are included.
    • testPKIgnoredForInsert - Ensures that PK is ignored when an INSERT request is done, and all fields continue to be included as they have been in the past.
    • testPKIgnoredForUpdateAfter - Ensures that PK is ignored when an UPDATE_AFTER request is done, and all fields continue to be included as they have been in the past.
  • Ran manual tests following the steps noted in https://issues.apache.org/jira/browse/FLINK-35500 under "Steps To Reproduce". Running the SQL statement as described in Step 6 now properly runs a DELETE in DynamoDB.

Significant changes

Previously, the PRIMARY KEY field had no significance for a DynamoDB Sink via Table API. Now, the PRIMARY KEY is required when processing a CDC stream that contains DELETES. This is not a 'breaking change' because the previous behavior for processing a CDC stream containing DELETES was already a failure (The provided key element does not match the schema). This change now provides a clear exception informing users to specify a Primary Key to avoid that failure. To clarify this change, the PR contains updates to the Connector documentation.

robg-eb avatar Jul 24 '24 20:07 robg-eb

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

boring-cyborg[bot] avatar Jul 24 '24 20:07 boring-cyborg[bot]

There are some style violations, Could you run mvn spotless:apply ahead of the PR please?

vahmed-hamdy avatar Jul 28 '24 15:07 vahmed-hamdy

@vahmed-hamdy -

There are some style violations, Could you run mvn spotless:apply ahead of the PR please?

Done, I've applied updated formatting now!

robg-eb avatar Aug 01 '24 14:08 robg-eb

Thanks @robg-eb , @hlteoh37 could you please take a look

vahmed-hamdy avatar Aug 12 '24 18:08 vahmed-hamdy

I addressed all the open comments from @z3d1k now. Thank you all for the suggestions! Sounds like @hlteoh37 is the next who should take a look?

robg-eb avatar Aug 13 '24 21:08 robg-eb

@robg-eb Thank you for working on this. Can I check if you plan to address the remaining comments?

hlteoh37 avatar Nov 04 '24 09:11 hlteoh37

It's too bad this bugfix is withering on the vine, and really makes me wonder how many folks are actually using Dynamodb as a sink via SQL/Table API 😬.

My team is a bit blocked by this bug, it'd be nice to pick it back up.

If core AWS maintainers aren't triaging this as a priority issue (which it should be in my unsolicited opinion 😄 )- and @robg-eb no longer has time/energy to get this over the finish line, I'd be ok with giving it a shot.

bradodarb avatar Sep 25 '25 22:09 bradodarb