kinesis-sql
kinesis-sql copied to clipboard
Data Loss when extracting data from Kinesis
We were having issues with data loss. We logged the data sent to kinesis from our producers and compared the data in our sinks. We are 100% sure that the data was pushed to kinesis but the data for 1 minutes was lost. Any possible reason why this was the case.
PS: I know the active repo is https://github.com/roncemer/spark-sql-kinesis but I could not post issues in the repo
Please help
@roncemer - Any idea why is this happening. My initial guess is the kinesis re-sharding. So I have added the option .option("kinesis.client.describeShardInterval", "500ms")
but dont know if this will fix it
@success-m I accidentally had issues disabled on my repo. I enabled that feature. If you have a change you'd like to submit, feel free to issue a pull request against https://github.com/roncemer/spark-sql-kinesis and I will merge it and drop a new release as soon as I can get to it.
I am currently not using this project for anything (I switched to using Kinesis Firehose Delivery Streams with AWS Lambda functions, as it's cheaper and doesn't require any explicit checkpointing mechanism), so if you're interested in taking over the project, I would be happy to add you as a maintainer and provide instructions for packaging and publishing updated versions.
@roncemer - I don't have any changes that needs to be pushed yet.
But ya, please do add me in. I would like to contribute to the library.