data-prepper icon indicating copy to clipboard operation
data-prepper copied to clipboard

[BUG] Empty DLQ Objects and DLQ objects with data even though data is loaded correctly

Open amitkirdatt opened this issue 1 year ago • 1 comments

Describe the bug

  • Pipeline with dyanamodb as the source and OpenSearch Serverless sink is creating empty dlqObjects {"dlqObjects":[]}
  • non-empty dlqObjects are created even though data is loaded into OpenSearch. Seeing messages like these "status":0,"message":"Number of retries reached the limit of max retries (configured value 10)

To Reproduce Steps to reproduce the behavior:

  1. Define a pipeline with a dynamodb table as the source (ideally with at least 10M records)
  2. Define an OpenSearch serverless sink
  3. Define S3 bucket and prefix for dlq
  4. Run pipeline
  5. DLQ S3 bucket will have several empty s3 objects that are 17.0 bytes in size ({"dlqObjects":[]}
  6. Some DLQ S3 objects have data, but those items are loaded in OpenSearch

Expected behavior

  • No DLQ objects are created if the data has been loaded successfully.
  • If data load is not successful and dlq s3 object is created, then dlqObjects should be populated with relevant data.
  • If data is ingested in OpenSearch dlq object with the id should not be created

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu 20.04 LTS]
  • Version [e.g. 22]

Additional context

  • max_retries is set to 10
  • Pipeline has has min 1 OCU and max 20 OCU
  • dynamodb table has ~100M records
  • OpenSearch Serverless sink

amitkirdatt avatar Mar 20 '24 17:03 amitkirdatt

@amitkirdatt , We are releasing Data Prepper 2.8.0 today with a fix that may resolve this. See #4301.

dlvenable avatar May 16 '24 15:05 dlvenable