aioboto3 icon indicating copy to clipboard operation
aioboto3 copied to clipboard

Potential data-loss bug in BatchWriter flushing function

Open rudaporto opened this issue 3 years ago • 1 comments

  • Async AWS SDK for Python version: aioboto3==9.2.2 and aiobotocore==1.4.2
  • Python version: 3.8.11
  • Operating System: Amazon Linux 2

Description

This bug report is a replica of the same bug reported on boto3:

  • https://github.com/boto/boto3/issues/1424

Since the aioboto3.dynamodb.table.BatchWriter._flush is the same as the one in boto3, it has the same bug:

https://github.com/terrycain/aioboto3/blob/c413766fe9cd95a0889d80eb5099d4296ebeaa1a/aioboto3/dynamodb/table.py#L98-L112

There is a PR request open in boto3 to fix two issues:

  • avoid to clean up the buffer again in this line since it was already updated in this line.
  • fix the logging with the unprocessed items

We bump in this issue because we are using a shared batch writer between multiple coroutines adding items.

By comparing our old implementation with the new one we discovered that the new one using aioboto3 BatchWriter is loosing data in some cases: some records were never persisted in DynamoDB.

Fix

It looks like the solution proposed in the boto3 PR could easily be back ported here. I volunteer to work on it by creating a PR with the fix and new unit tests to simulate the issues.

rudaporto avatar Nov 05 '21 13:11 rudaporto

Yeah that definitely is a bug, I'd also put the unprocessed items at the start of the items buffer.

I'm in two minds about this. I'd prefer not to diverge too much from the boto3 codebase, but then again, this is clearly a bug.

Yeah, feel free to raise a PR, can you stick a comment in that function linking to this issue so there is a record of why the code diverges slightly from boto3 logic.

terricain avatar Mar 13 '22 11:03 terricain

Should this be closed? It seems that a fix has been merged.

tibbe avatar Mar 02 '23 14:03 tibbe