kinesis-sql icon indicating copy to clipboard operation
kinesis-sql copied to clipboard

Data Loss when extracting data from Kinesis

Open success-m opened this issue 1 year ago • 3 comments

We were having issues with data loss. We logged the data sent to kinesis from our producers and compared the data in our sinks. We are 100% sure that the data was pushed to kinesis but the data for 1 minutes was lost. Any possible reason why this was the case.

PS: I know the active repo is https://github.com/roncemer/spark-sql-kinesis but I could not post issues in the repo

Please help

success-m avatar Apr 03 '23 12:04 success-m

@roncemer - Any idea why is this happening. My initial guess is the kinesis re-sharding. So I have added the option .option("kinesis.client.describeShardInterval", "500ms") but dont know if this will fix it

success-m avatar Apr 03 '23 12:04 success-m

@success-m I accidentally had issues disabled on my repo. I enabled that feature. If you have a change you'd like to submit, feel free to issue a pull request against https://github.com/roncemer/spark-sql-kinesis and I will merge it and drop a new release as soon as I can get to it.

I am currently not using this project for anything (I switched to using Kinesis Firehose Delivery Streams with AWS Lambda functions, as it's cheaper and doesn't require any explicit checkpointing mechanism), so if you're interested in taking over the project, I would be happy to add you as a maintainer and provide instructions for packaging and publishing updated versions.

roncemer avatar Apr 04 '23 03:04 roncemer

@roncemer - I don't have any changes that needs to be pushed yet.

But ya, please do add me in. I would like to contribute to the library.

success-m avatar Apr 04 '23 12:04 success-m