cache: [WIP] implement MVP watch demux (Code Review Only)
This is a WIP draft PR created to start the code review process. Feedback welcome!
What’s in this PR
- implements the MVP Watch demultiplexer only, per earlier discussion with mentors
- defers the storage & indexing layer to next stage, so we can focus on correctness and robustness of the core demux
What's next
- continue writing unit tests for all demux/watch behaviors (catch-up, backoff, drop threshold, etc.)
Tagging mentors for code review: @serathius @MadhavJivrajani
Hi @apullo777. Thanks for your PR.
I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Overall note on architecture. You have two separate synchronization methods, first one sends events available in ringBuffer upon the watch creation, and second broadcasts new events to synchronized watchers. If a synchronized watcher don't handle the event, the only option you have is to drop it (count missed events is not correct). With watcher buffer 128 and shared buffer 2048 if watch just requests old revision, the per watcher buffer will be filled and broadcast will immediately fail. Fix to that is to keep a list of unsychronized watchers and periodically attempt to synchronize them.
This looks much better, good job!
Expect maybe 1-2 more iterations and this will be mergable.
Note, please close comments that you have addressed.
Just couple of last comments, and this code is in mergeable. Please follow https://github.com/etcd-io/etcd/pull/20160/checks?check_run_id=44807170862 instructions to fix the DCO and remove the draft.
Please also remove the conflict
/ok-to-test
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 65.85%. Comparing base (
4b98469) to head (2c311cc). Report is 2 commits behind head on main.
Additional details and impacted files
see 54 files with indirect coverage changes
@@ Coverage Diff @@
## main #20160 +/- ##
==========================================
- Coverage 69.00% 65.85% -3.16%
==========================================
Files 415 412 -3
Lines 34595 34460 -135
==========================================
- Hits 23873 22693 -1180
- Misses 9328 10379 +1051
+ Partials 1394 1388 -6
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 4b98469...2c311cc. Read the comment docs.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Please fix:
cache_test.go:89:28: context.Background() could be replaced by t.Context() in TestCacheWatcherSeesEntireKeyspace (usetesting)
if err := cache.WaitReady(context.Background()); err != nil {
/retest pull-etcd-release-tests
@serathius: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:
/test pull-etcd-build
/test pull-etcd-contrib-mixin
/test pull-etcd-coverage-report
/test pull-etcd-e2e-386
/test pull-etcd-e2e-amd64
/test pull-etcd-e2e-arm64
/test pull-etcd-fuzzing-v3rpc
/test pull-etcd-govulncheck-main
/test pull-etcd-grpcproxy-e2e-amd64
/test pull-etcd-grpcproxy-e2e-arm64
/test pull-etcd-grpcproxy-integration-amd64
/test pull-etcd-grpcproxy-integration-arm64
/test pull-etcd-integration-1-cpu-amd64
/test pull-etcd-integration-1-cpu-arm64
/test pull-etcd-integration-2-cpu-amd64
/test pull-etcd-integration-2-cpu-arm64
/test pull-etcd-integration-4-cpu-amd64
/test pull-etcd-integration-4-cpu-arm64
/test pull-etcd-markdown-lint
/test pull-etcd-release-tests
/test pull-etcd-robustness-amd64
/test pull-etcd-robustness-arm64
/test pull-etcd-unit-test-386
/test pull-etcd-unit-test-amd64
/test pull-etcd-unit-test-arm64
/test pull-etcd-verify
Use /test all to run the following jobs that were automatically triggered:
pull-etcd-build
pull-etcd-contrib-mixin
pull-etcd-coverage-report
pull-etcd-e2e-386
pull-etcd-e2e-amd64
pull-etcd-e2e-arm64
pull-etcd-fuzzing-v3rpc
pull-etcd-govulncheck-main
pull-etcd-grpcproxy-e2e-amd64
pull-etcd-grpcproxy-e2e-arm64
pull-etcd-grpcproxy-integration-amd64
pull-etcd-grpcproxy-integration-arm64
pull-etcd-integration-1-cpu-amd64
pull-etcd-integration-1-cpu-arm64
pull-etcd-integration-2-cpu-amd64
pull-etcd-integration-2-cpu-arm64
pull-etcd-integration-4-cpu-amd64
pull-etcd-integration-4-cpu-arm64
pull-etcd-release-tests
pull-etcd-robustness-amd64
pull-etcd-robustness-arm64
pull-etcd-unit-test-386
pull-etcd-unit-test-amd64
pull-etcd-unit-test-arm64
pull-etcd-verify
In response to this:
/retest pull-etcd-release-tests
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/retest pull-etcd-release-tests
Missclick
/retest
pull-etcd-release-tests fails, looks like release script doesn't handle packages that were never released?
cc @ivanvc any guesses?
go: updates to go.mod needed, disabled by -mod=readonly; to update it:
go mod tidy
Please run go mod tidy in root and cache directories.
To confirm that it worked you can run make verify-gofmt
go: updates to go.mod needed, disabled by -mod=readonly; to update it: go mod tidyPlease run
go mod tidyin root and cache directories.To confirm that it worked you can run
make verify-gofmt
I’ve run go mod tidy in both the root and the cache directories, but looks like the “updates to go.mod needed” error still appear. make verify-gofmt also showed a PASSED and SUCCESS. Any idea of what is happening? Perhaps go.mod discrepancy?
I’ve run go mod tidy in both the root and the cache directories
You need to rebase on origin/main first. The CI doesn't test your exact commit, but a rebased on latest branch state.
-release
@serathius, I had a hard time finding this message in this pull request; it has a lot of activity. I see that you suggested a change that will fix the release tests, but they are still failing. Before I investigate, is this still an issue? Please feel free to contact me on Slack.
@serathius Sorry, I couldn't solve the rebase issue, and it now turns into a weird situation. I tried to do git rebase origin/main but it hit a strange issue: even with a clean working tree (both git status and git stash report no local edits), my git rebase aborts with merge conflicts with go.mod/sum files. The weird part is that Git seems to think those files are already up-to-date, so I cannot resolve the "invisible" conflict.
Stuck with this issue, I tried to force pushed a brand new branch onto the old PR branch name. That fixed the CI-side rebase error, but in doing so it also pulled in the root repo’s latest bump-Go commit (and its file changes) into my cache-mvp feature branch. So now those upstream modifications are being pushed as part of the commit, while they do not appear on my local, so I cannot remove them to update the PR commit.
At this point I can’t seem to remove them or get a clean history. Should I start over a new PR with a new branch?
-release
@serathius, I had a hard time finding this message in this pull request; it has a lot of activity. I see that you suggested a change that will fix the release tests, but they are still failing. Before I investigate, is this still an issue? Please feel free to contact me on Slack.
@ivanvc Thanks for taking a look! It appears the release test is now passing, so that part is resolved. But now my feature work are polluted I will try to figure out how to solve it. Sorry for the confusion...
Should I start over a new PR with a new branch?
No
@serathius Sorry, I couldn't solve the rebase issue, and it now turns into a weird situation. I tried to do
git rebase origin/mainbut it hit a strange issue: even with a clean working tree (bothgit statusandgit stashreport no local edits), mygit rebaseaborts with merge conflicts with go.mod/sum files. The weird part is that Git seems to think those files are already up-to-date, so I cannot resolve the "invisible" conflict.
Yea, you broke git history. Somehow you changed the commit type to merge commit. I fixed it and pushed to your fork. Before making any more changes please fetch the branch locally.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: apullo777, serathius
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [serathius]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
There are still a lot of things we can improve, but this looks like a good start. The most important thing next is fixing the Atomic watch property. The current implementation assumes that revision is a unique number, this is not true.
TXN can have multiple operations (Put/Delete) that are executed within one transactions, all this operations will share revision. Atomic watch guarantee ensures that events that share revisions will always be send in single watch response and not broken up. https://etcd.io/docs/v3.5/learning/api_guarantees/#watch-apis
Prepared PR to show what I mean https://github.com/etcd-io/etcd/pull/20272