scylla-cdc-go
scylla-cdc-go copied to clipboard
Stream_bath - pollWindow size in case of restart client application
Issue description
We are trying to implement storing progress in our cdc-consumer-service using the inbuilt progress manager(provided by scylla-cdc-go). But after the cdc-consumer-service and restarting it, we are unable to receive any more cdc updates. However, according to the logs the progress is loaded for the last_timestamp(check screenshot attached).
On further checking the library code, we noticed that this could be happening because inside the getPollWindow() function in stream_batch.go, the queryWindowRightEnd and confidenceWindowStart are very far apart. On printing the queryWindowRightEnd, we found this value to be equal to the current_generation value(2021-08-04 05:42:32.878000+0000 - progress table screenshot attached) and confidenceWindowStart= time.Now(). Because of this huge size window we are not receiving cdc updates.
On changing this window value to a small one(say 1second - by hard coding), it is working fine. So I believe the value of queryWindowRightEnd in getPollWindow() function is not getting set to the appropriate value. Are we missing something here?###hat this could be happening because inside the getPollWindow() function in stream_batch.go, the queryWindowRightEnd and confidenceWindowStart are very far apart. On printing the queryWindowRightEnd, we found this value to be equal to the current_generation value(2021-08-04 05:42:32.878000+0000 - progress table screenshot attached) and confidenceWindowStart= time.Now(). Because of this huge size window we are not receiving cdc updates. On changing this window value to a small one(say 1second - by hard coding), it is working fine.
So I believe the value of queryWindowRightEnd in getPollWindow() function is not getting set to the appropriate value. Are we missing something here?
We have seen a similar behavior reported before https://scylladb-users.slack.com/archives/C2NLNBXLN/p1632379210266400
In short, after we restart a consumer with progresstable, unprocessed stream_ids will start polling from the generation's start and, with very small QueryTimeWindowSize
values, this may cause it to take a VERY LONG TIME to catch up, specially with a very old stream generation.
A way to "workaround" it is to increase QueryTimeWindowSize
to something like 24h, but it may not be an ideal approach for everybody.
I think the issue/report should be a programmatic way to allow specifying the start Poll time (at user own's risk), instead of polling from the starting generation for unprocessed stream_ids. This will help development/non-write-heavy workloads to catch up faster when polling unprocessed stream IDs.
FYI @piodul @haaawk
FYI @avelanarius.
@piodul and @avelanarius let's have a meeting to find a common solution for both Golang and Java
Please note I've addressed the issue with persisting CDC log reader state + restart behaviour in https://github.com/scylladb/scylla-cdc-go/pull/10.
And I also would be interested to help with Java lib + scylla-cdc-source-connector where I'm suspecting similar issues (but even harder to debug/analyse...).
This should be fixed by https://github.com/scylladb/scylla-cdc-go/pull/13. Now, when the library starts reading for the first time (without any saved progress), the default progress manager saves the timestamp it originally started reading from and, when restarting, makes sure not to read older changes than that when restarting.
Users can also implement the Empty()
method in their consumers so that saving progress during the time when there are no rows on a stream is possible.
@AdamStawarz , could you please update if your issue was resolved by #13