cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

kv/kvserver: TestFlowControlRaftMembershipV2 failed

Open cockroach-teamcity opened this issue 8 months ago • 4 comments

kv/kvserver.TestFlowControlRaftMembershipV2 failed on master @ 5c4a24e928c792bc3408aff56252a88ca65256fd:

=== RUN   TestFlowControlRaftMembershipV2
    test_log_scope.go:165: test logs captured to: outputs.zip/logTestFlowControlRaftMembershipV21397668456
    test_log_scope.go:76: use -show-logs to present logs inline
    flow_control_integration_test.go:1024: -- test log scope end --
test logs left over in: outputs.zip/logTestFlowControlRaftMembershipV21397668456
--- FAIL: TestFlowControlRaftMembershipV2 (30.93s)
=== RUN   TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all
    flow_control_integration_test.go:3970: 
          /var/lib/engflow/worker/work/0/exec/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_test_/kvserver_test.runfiles/com_github_cockroachdb_cockroach/pkg/kv/kvserver/testdata/flow_control_integration_v2/raft_membership_v2_encoding_apply_to_all:1:
          echo
        output didn't match expected:
          @@ -152,23 +152,23 @@
             kvflowcontrol.tokens.eval.elastic.deducted                        | 11 MiB   
             kvflowcontrol.tokens.eval.elastic.returned                        | 11 MiB   
             kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
             kvflowcontrol.tokens.eval.regular.available                       | 80 MiB   
          -  kvflowcontrol.tokens.eval.regular.deducted                        | 11 MiB   
          -  kvflowcontrol.tokens.eval.regular.returned                        | 11 MiB   
          +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
          +  kvflowcontrol.tokens.eval.regular.returned                        | 10 MiB   
             kvflowcontrol.tokens.eval.regular.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
             kvflowcontrol.tokens.send.elastic.available                       | 40 MiB   
             kvflowcontrol.tokens.send.elastic.deducted                        | 11 MiB   
             kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
             kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
             kvflowcontrol.tokens.send.elastic.returned                        | 11 MiB   
             kvflowcontrol.tokens.send.elastic.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
             kvflowcontrol.tokens.send.regular.available                       | 80 MiB   
          -  kvflowcontrol.tokens.send.regular.deducted                        | 11 MiB   
          +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
             kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
          -  kvflowcontrol.tokens.send.regular.returned                        | 11 MiB   
          +  kvflowcontrol.tokens.send.regular.returned                        | 10 MiB   
             kvflowcontrol.tokens.send.regular.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
           
--- FAIL: TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all (15.71s)

Parameters:

  • attempt=1
  • race=true
  • run=1
  • shard=18
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #141912 kv/kvserver: TestFlowControlRaftMembershipV2 failed [A-testing C-bug C-test-failure O-robot P-3 T-kv X-nostale branch-release-25.1]
/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-49418

cockroach-teamcity avatar Apr 16 '25 13:04 cockroach-teamcity

This is a duplicate of https://github.com/cockroachdb/cockroach/issues/141912#issuecomment-2688996459. Applying the corresponding labels and marking as duplicate.

tbg avatar Apr 17 '25 08:04 tbg

kv/kvserver.TestFlowControlRaftMembershipV2 failed on master @ 70bb4fafea5534d34fab9440975da5526933c955:

=== RUN   TestFlowControlRaftMembershipV2
    test_log_scope.go:165: test logs captured to: outputs.zip/logTestFlowControlRaftMembershipV23784194160
    test_log_scope.go:76: use -show-logs to present logs inline
    flow_control_integration_test.go:1024: -- test log scope end --
test logs left over in: outputs.zip/logTestFlowControlRaftMembershipV23784194160
--- FAIL: TestFlowControlRaftMembershipV2 (31.58s)
=== RUN   TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all
    flow_control_integration_test.go:3970: 
          /var/lib/engflow/worker/work/0/exec/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_test_/kvserver_test.runfiles/com_github_cockroachdb_cockroach/pkg/kv/kvserver/testdata/flow_control_integration_v2/raft_membership_v2_encoding_apply_to_all:1:
          echo
        output didn't match expected:
          @@ -152,23 +152,23 @@
             kvflowcontrol.tokens.eval.elastic.deducted                        | 11 MiB   
             kvflowcontrol.tokens.eval.elastic.returned                        | 11 MiB   
             kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
             kvflowcontrol.tokens.eval.regular.available                       | 80 MiB   
          -  kvflowcontrol.tokens.eval.regular.deducted                        | 11 MiB   
          -  kvflowcontrol.tokens.eval.regular.returned                        | 11 MiB   
          +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
          +  kvflowcontrol.tokens.eval.regular.returned                        | 10 MiB   
             kvflowcontrol.tokens.eval.regular.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
             kvflowcontrol.tokens.send.elastic.available                       | 40 MiB   
             kvflowcontrol.tokens.send.elastic.deducted                        | 11 MiB   
             kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
             kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
             kvflowcontrol.tokens.send.elastic.returned                        | 11 MiB   
             kvflowcontrol.tokens.send.elastic.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
             kvflowcontrol.tokens.send.regular.available                       | 80 MiB   
          -  kvflowcontrol.tokens.send.regular.deducted                        | 11 MiB   
          +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
             kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
          -  kvflowcontrol.tokens.send.regular.returned                        | 11 MiB   
          +  kvflowcontrol.tokens.send.regular.returned                        | 10 MiB   
             kvflowcontrol.tokens.send.regular.returned.disconnect             | 2.0 MiB  
             kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
           
--- FAIL: TestFlowControlRaftMembershipV2/kvadmission.flow_control.mode=apply_to_all (15.70s)

Parameters:

  • attempt=1
  • race=true
  • run=3
  • shard=18
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #141912 kv/kvserver: TestFlowControlRaftMembershipV2 failed [A-testing C-bug C-test-failure O-robot P-3 T-kv X-nostale branch-release-25.1]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Apr 23 '25 13:04 cockroach-teamcity

In the failing run, we get this:

I250618 08:27:51.238514 87 kv/kvserver_test/flow_control_integration_test.go:3780  [-] 452  -- (Issuing 1x1MiB, 4x replicated write (w/ one non-voter) that's not admitted.
I250618 08:27:51.238808 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 453  adjusted send flow tokens (wc=regular stream=t1/s4 delta=-1.0 MiB flag=normal): regular=+14 MiB elastic=+6.0 MiB
I250618 08:27:51.238875 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 454  adjusted eval flow tokens (wc=regular stream=t1/s4 delta=-1.0 MiB flag=normal): regular=+14 MiB elastic=+6.0 MiB
I250618 08:27:51.238924 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 455  adjusted send flow tokens (wc=regular stream=t1/s1 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.238958 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 456  adjusted eval flow tokens (wc=regular stream=t1/s1 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.238997 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 457  adjusted send flow tokens (wc=regular stream=t1/s2 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.239027 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 458  adjusted eval flow tokens (wc=regular stream=t1/s2 delta=-1.0 MiB flag=normal): regular=+13 MiB elastic=+5.0 MiB
I250618 08:27:51.239062 2709 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 459  adjusted eval flow tokens (wc=elastic stream=t1/s5 delta=-1.0 MiB flag=normal): regular=+16 MiB elastic=+7.0 MiB

Note the wc=elastic for s5. This should be regular if the test is to pass. Additionally, we see an errant 103kb elastic adjustment (which, for the send flow tokens, is bunched up with the 1mb above for an 1.1mb deduction, but then is quickly returned):

I250618 08:27:51.239169 9941 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [-] 460  adjusted send flow tokens (wc=elastic stream=t1/s5 delta=-1.1 MiB flag=normal): regular=+16 MiB elastic=+6.9 MiB
I250618 08:27:51.239233 2710 kv/kvserver/kvflowcontrol/rac2/token_counter.go:682  [T1,Vsystem,n1] 461  adjusted send flow tokens (wc=elastic stream=t1/s5 delta=+103 KiB flag=normal): regular=+16 MiB elastic=+7.0 MiB

I've seen this in all repros so far. In a passing run, this blip is not there and s5 is deducted as regular, same as the other nodes.

This seems to confirm what @kvoli was seeing here:

More than likely we queued the entries then deducted at an elastic priority. This seems benign and a testing issue more than anything else. Marking as P-3 and a testing issue.

I don't understand why this is benign, or why it's considered a testing issue. Will read up some more.

tbg avatar Jun 18 '25 08:06 tbg

I added this custom logging

I250618 09:09:08.791748 2733 kv/kvserver/kvflowcontrol/rac2/range_controller.go:2885  [T1,Vsystem,n1,s1,r75/1:/{Table/Max-Max},raft] 460  using low priority for send queue entry in pull mode

here:

https://github.com/cockroachdb/cockroach/blob/93519aef3385e7dca7b2cda8c528e6fa3027f6e9/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go#L2883-L2885

and it fires in the repro (but not otherwise). So for some reason, sometimes, for s5 we have a send queue when we enqueue the request, and this causes tokens to be deducted at elastic. I still need to understand why a send queue can sometimes form here, and whether the right workaround is "simply" to add checks to the test that wait for it to resolve, which it likely does rapidly ("the blip").

tbg avatar Jun 18 '25 09:06 tbg

Based on the specified backports for linked PR #149661, I applied the following new label(s) to this issue: branch-release-25.1, branch-release-25.2, branch-release-25.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Jul 08 '25 13:07 blathers-crl[bot]