tiflow
tiflow copied to clipboard
update of changefeed configiration changefeed-error-stuck-duration not working
What did you do?
- creaet an kafka changefeed with default configuration (changefeed-error-stuck-duration default 30m)
- pause and update changefeed, set changefeed-error-stuck-duration to 90m
- resume changefed
- Inject network partition between cdc and kafka for 60m
What did you expect to see?
CDC should be able to tolerate network partition between cdc and kafka for 90m, before that CDC changefeed should not be in failed state.
What did you see instead?
CDC changefeed in failed state after 30m.
Versions of the cluster
master
/severity moderate
The root cause of the issue is that the feedStateManager.changefeedErrorStuckDuration
is set at the time of a changefeed's creation and remains unchanged when the changefeed is updated.
A possible solution is to set the feedStateManager.changefeedErrorStuckDuration
to the latest one store in changefeedInfo in every feedStateManager.TiCK
call.
/assign @wk989898
@asddongmen: GitHub didn't allow me to assign the following users: wk989898.
Note that only pingcap members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @wk989898
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Get