kv/kvserver: TestFlowControlRaftMembershipV2 failed
kv/kvserver.TestFlowControlRaftMembershipV2 failed on master @ 5c4a24e928c792bc3408aff56252a88ca65256fd:
=== RUN TestFlowControlRaftMembershipV2
test_log_scope.go:165: test logs captured to: outputs.zip/logTestFlowControlRaftMembershipV21397668456
test_log_scope.go:76: use -show-logs to present logs inline
flow_control_integration_test.go:1024: -- test log scope end --
test logs left over in: outputs.zip/logTestFlowControlRaftMembershipV21397668456
--- FAIL: TestFlowControlRaftMembershipV2 (30.93s)
=== RUN TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all
flow_control_integration_test.go:3970:
/var/lib/engflow/worker/work/0/exec/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_test_/kvserver_test.runfiles/com_github_cockroachdb_cockroach/pkg/kv/kvserver/testdata/flow_control_integration_v2/raft_membership_v2_encoding_apply_to_all:1:
echo
output didn't match expected:
@@ -152,23 +152,23 @@
kvflowcontrol.tokens.eval.elastic.deducted | 11 MiB
kvflowcontrol.tokens.eval.elastic.returned | 11 MiB
kvflowcontrol.tokens.eval.elastic.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.eval.elastic.unaccounted | 0 B
kvflowcontrol.tokens.eval.regular.available | 80 MiB
- kvflowcontrol.tokens.eval.regular.deducted | 11 MiB
- kvflowcontrol.tokens.eval.regular.returned | 11 MiB
+ kvflowcontrol.tokens.eval.regular.deducted | 10 MiB
+ kvflowcontrol.tokens.eval.regular.returned | 10 MiB
kvflowcontrol.tokens.eval.regular.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.eval.regular.unaccounted | 0 B
kvflowcontrol.tokens.send.elastic.available | 40 MiB
kvflowcontrol.tokens.send.elastic.deducted | 11 MiB
kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B
kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue | 0 B
kvflowcontrol.tokens.send.elastic.returned | 11 MiB
kvflowcontrol.tokens.send.elastic.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.send.elastic.unaccounted | 0 B
kvflowcontrol.tokens.send.regular.available | 80 MiB
- kvflowcontrol.tokens.send.regular.deducted | 11 MiB
+ kvflowcontrol.tokens.send.regular.deducted | 10 MiB
kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue | 0 B
- kvflowcontrol.tokens.send.regular.returned | 11 MiB
+ kvflowcontrol.tokens.send.regular.returned | 10 MiB
kvflowcontrol.tokens.send.regular.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.send.regular.unaccounted | 0 B
--- FAIL: TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all (15.71s)
Parameters:
attempt=1race=truerun=1shard=18
Same failure on other branches
- #141912 kv/kvserver: TestFlowControlRaftMembershipV2 failed [A-testing C-bug C-test-failure O-robot P-3 T-kv X-nostale branch-release-25.1]
This test on roachdash | Improve this report!
Jira issue: CRDB-49418
This is a duplicate of https://github.com/cockroachdb/cockroach/issues/141912#issuecomment-2688996459. Applying the corresponding labels and marking as duplicate.
kv/kvserver.TestFlowControlRaftMembershipV2 failed on master @ 70bb4fafea5534d34fab9440975da5526933c955:
=== RUN TestFlowControlRaftMembershipV2
test_log_scope.go:165: test logs captured to: outputs.zip/logTestFlowControlRaftMembershipV23784194160
test_log_scope.go:76: use -show-logs to present logs inline
flow_control_integration_test.go:1024: -- test log scope end --
test logs left over in: outputs.zip/logTestFlowControlRaftMembershipV23784194160
--- FAIL: TestFlowControlRaftMembershipV2 (31.58s)
=== RUN TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all
flow_control_integration_test.go:3970:
/var/lib/engflow/worker/work/0/exec/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_test_/kvserver_test.runfiles/com_github_cockroachdb_cockroach/pkg/kv/kvserver/testdata/flow_control_integration_v2/raft_membership_v2_encoding_apply_to_all:1:
echo
output didn't match expected:
@@ -152,23 +152,23 @@
kvflowcontrol.tokens.eval.elastic.deducted | 11 MiB
kvflowcontrol.tokens.eval.elastic.returned | 11 MiB
kvflowcontrol.tokens.eval.elastic.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.eval.elastic.unaccounted | 0 B
kvflowcontrol.tokens.eval.regular.available | 80 MiB
- kvflowcontrol.tokens.eval.regular.deducted | 11 MiB
- kvflowcontrol.tokens.eval.regular.returned | 11 MiB
+ kvflowcontrol.tokens.eval.regular.deducted | 10 MiB
+ kvflowcontrol.tokens.eval.regular.returned | 10 MiB
kvflowcontrol.tokens.eval.regular.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.eval.regular.unaccounted | 0 B
kvflowcontrol.tokens.send.elastic.available | 40 MiB
kvflowcontrol.tokens.send.elastic.deducted | 11 MiB
kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B
kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue | 0 B
kvflowcontrol.tokens.send.elastic.returned | 11 MiB
kvflowcontrol.tokens.send.elastic.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.send.elastic.unaccounted | 0 B
kvflowcontrol.tokens.send.regular.available | 80 MiB
- kvflowcontrol.tokens.send.regular.deducted | 11 MiB
+ kvflowcontrol.tokens.send.regular.deducted | 10 MiB
kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue | 0 B
- kvflowcontrol.tokens.send.regular.returned | 11 MiB
+ kvflowcontrol.tokens.send.regular.returned | 10 MiB
kvflowcontrol.tokens.send.regular.returned.disconnect | 2.0 MiB
kvflowcontrol.tokens.send.regular.unaccounted | 0 B
--- FAIL: TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all (15.70s)
Parameters:
attempt=1race=truerun=3shard=18
Same failure on other branches
- #141912 kv/kvserver: TestFlowControlRaftMembershipV2 failed [A-testing C-bug C-test-failure O-robot P-3 T-kv X-nostale branch-release-25.1]
In the failing run, we get this:
I250618 08:27:51.238514 87 kv/kvserver_test/flow_control_integration_test.go:3780 [-] 452 -- (Issuing 1x1MiB, 4x replicated write (w/ one non-voter) that's not admitted.
I250618 08:27:51.238808 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 453 adjusted send flow tokens (wc=regular stream=t1/s4 delta=-1.0 MiB flag=normal): regular=+14 MiB elastic=+6.0 MiB
I250618 08:27:51.238875 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 454 adjusted eval flow tokens (wc=regular stream=t1/s4 delta=-1.0 MiB flag=normal): regular=+14 MiB elastic=+6.0 MiB
I250618 08:27:51.238924 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 455 adjusted send flow tokens (wc=regular stream=t1/s1 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.238958 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 456 adjusted eval flow tokens (wc=regular stream=t1/s1 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.238997 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 457 adjusted send flow tokens (wc=regular stream=t1/s2 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.239027 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 458 adjusted eval flow tokens (wc=regular stream=t1/s2 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.239062 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 459 adjusted eval flow tokens (wc=elastic stream=t1/s5 delta=-1.0 MiB flag=normal): regular=+16 MiB elastic=+7.0 MiB
Note the wc=elastic for s5. This should be regular if the test is to pass. Additionally, we see an errant 103kb elastic adjustment (which, for the send flow tokens, is bunched up with the 1mb above for an 1.1mb deduction, but then is quickly returned):
I250618 08:27:51.239169 9941 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [-] 460 adjusted send flow tokens (wc=elastic stream=t1/s5 delta=-1.1 MiB flag=normal): regular=+16 MiB elastic=+6.9 MiB
I250618 08:27:51.239233 2710 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682 [T1,Vsystem,n1] 461 adjusted send flow tokens (wc=elastic stream=t1/s5 delta=+103 KiB flag=normal): regular=+16 MiB elastic=+7.0 MiB
I've seen this in all repros so far. In a passing run, this blip is not there and s5 is deducted as regular, same as the other nodes.
This seems to confirm what @kvoli was seeing here:
More than likely we queued the entries then deducted at an elastic priority. This seems benign and a testing issue more than anything else. Marking as P-3 and a testing issue.
I don't understand why this is benign, or why it's considered a testing issue. Will read up some more.
I added this custom logging
I250618 09:09:08.791748 2733 kv/kvserver/kvflowcontrol/rac2/range_controller.go:2885 [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 460 using low priority for send queue entry in pull mode
here:
https://github.com/cockroachdb/cockroach/blob/93519aef3385e7dca7b2cda8c528e6fa3027f6e9/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go#L2883-L2885
and it fires in the repro (but not otherwise). So for some reason, sometimes, for s5 we have a send queue when we enqueue the request, and this causes tokens to be deducted at elastic. I still need to understand why a send queue can sometimes form here, and whether the right workaround is "simply" to add checks to the test that wait for it to resolve, which it likely does rapidly ("the blip").
Based on the specified backports for linked PR #149661, I applied the following new label(s) to this issue: branch-release-25.1, branch-release-25.2, branch-release-25.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.