aioboto3
aioboto3 copied to clipboard
Potential data-loss bug in BatchWriter flushing function
- Async AWS SDK for Python version: aioboto3==9.2.2 and aiobotocore==1.4.2
- Python version: 3.8.11
- Operating System: Amazon Linux 2
Description
This bug report is a replica of the same bug reported on boto3
:
- https://github.com/boto/boto3/issues/1424
Since the aioboto3.dynamodb.table.BatchWriter._flush
is the same as the one in boto3
, it has the same bug:
https://github.com/terrycain/aioboto3/blob/c413766fe9cd95a0889d80eb5099d4296ebeaa1a/aioboto3/dynamodb/table.py#L98-L112
There is a PR request open in boto3
to fix two issues:
- avoid to clean up the buffer again in this line since it was already updated in this line.
- fix the logging with the unprocessed items
We bump in this issue because we are using a shared batch writer between multiple coroutines
adding items.
By comparing our old implementation with the new one we discovered that the new one using aioboto3
BatchWriter is loosing data in some cases: some records were never persisted in DynamoDB.
Fix
It looks like the solution proposed in the boto3
PR could easily be back ported here.
I volunteer to work on it by creating a PR with the fix and new unit tests to simulate the issues.
Yeah that definitely is a bug, I'd also put the unprocessed items at the start of the items buffer.
I'm in two minds about this. I'd prefer not to diverge too much from the boto3 codebase, but then again, this is clearly a bug.
Yeah, feel free to raise a PR, can you stick a comment in that function linking to this issue so there is a record of why the code diverges slightly from boto3 logic.
Should this be closed? It seems that a fix has been merged.