redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth`

Open dotnwat opened this issue 2 years ago • 14 comments

https://buildkite.com/redpanda/redpanda/builds/27277#01878dc4-9515-414e-88a6-89d717325c82

Module: rptest.tests.pandaproxy_test
Class:  BasicAuthUpgradeTest
Method: test_upgrade_and_enable_basic_auth
Arguments:
{
  "base_release": [
    22,
    3
  ],
  "next_release": [
    23,
    1
  ]
}
====================================================================================================
test_id:    rptest.tests.pandaproxy_test.BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth.base_release=.22.3.next_release=.23.1
status:     FAIL
run time:   54.194 seconds


    TimeoutError('Node docker-rp-23 draining leaderships')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/pandaproxy_test.py", line 2127, in test_upgrade_and_enable_basic_auth
    self.redpanda.rolling_restart_nodes(self.redpanda.nodes)
  File "/root/tests/rptest/services/redpanda.py", line 2289, in rolling_restart_nodes
    restarter.restart_nodes(nodes,
  File "/root/tests/rptest/services/rolling_restarter.py", line 88, in restart_nodes
    wait_until(lambda: has_drained_leaders(node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Node docker-rp-23 draining leaderships

dotnwat avatar Apr 17 '23 19:04 dotnwat

Probably related https://github.com/redpanda-data/redpanda/issues/9049

dotnwat avatar Apr 17 '23 19:04 dotnwat

on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29125#018818fc-3f05-4301-a16c-3031c6b5d247

dlex avatar May 15 '23 19:05 dlex

A very much similar failure in https://buildkite.com/redpanda/redpanda/builds/29432#01883193-77a7-4f46-bac1-be8fac94ff52

test_id:    rptest.tests.cluster_bootstrap_test.ClusterBootstrapUpgrade.test_change_bootstrap_configs_during_upgrade.empty_seed_starts_cluster=False
status:     FAIL
run time:   1 minute 31.113 seconds


    TimeoutError('Node docker-rp-23 draining leaderships')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/cluster_bootstrap_test.py", line 137, in test_change_bootstrap_configs_during_upgrade
    self.redpanda.rolling_restart_nodes(self.redpanda.nodes,
  File "/root/tests/rptest/services/redpanda.py", line 953, in rolling_restart_nodes
    restarter.restart_nodes(nodes,
  File "/root/tests/rptest/services/rolling_restarter.py", line 88, in restart_nodes
    wait_until(lambda: has_drained_leaders(node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Node docker-rp-23 draining leaderships

Log excerpts from docker-rp-23 node_id 1.

DEBUG - 2023-05-19 02:04:33,890 - admin - _request - lineno:332]: Dispatching put http://docker-rp-23:9644/v1/brokers/1/maintenance
DEBUG 2023-05-19 02:04:33,893 [shard 0] admin_api_server - admin_server.cc:365 - [admin] PUT http://docker-rp-23:9644/v1/brokers/1/maintenance 
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - members_table.cc:221 - marking node 1 in maintenance state 
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - members_table.cc:221 - marking node 1 in maintenance state    
...
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:49 - Node draining is starting                                                                                                                              
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:143 - Node draining has started                                                                                                                             
INFO  2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2946 - Starting leadership transfer from {id: {1}, revision: {0}} to {id: {3}, revision: {0}} in term 1                      
TRACE 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2801 - transfer leadership: preparing target={id: {3}, revision: {0}}, dirty_offset=21                                       
TRACE 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2807 - transfer leadership: cleared oplock                                                                                   
DEBUG 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2839 - transfer leadership: node {id: {3}, revision: {0}} doesn't need recovery or is already recovering (is_recovering false dirty offset 21)
DEBUG 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2872 - transfer leadership: node {id: {3}, revision: {0}} is not recovering, proceeding (dirty offset 21)                    
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:207 - Draining leadership from 1 groups                                                                                                                     
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:49 - Node draining is starting                                                                                                                              
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:143 - Node draining has started                                                                                                                             
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:256 - Node draining has completed on shard 1                                                                                                                
TRACE 2023-05-19 02:04:33,894 [shard 0] storage - readers_cache.cc:328 - {redpanda/controller/0} - removing reader: [0,21] lower_bound: 22                                                                                     
[DEBUG - 2023-05-19 02:04:33,895 - admin - _request - lineno:361]: Response OK                                                                                                                                                 
[INFO  - 2023-05-19 02:04:33,895 - rolling_restarter - restart_nodes - lineno:86]: Waiting for node docker-rp-23 leadership drain                                                                                              
[DEBUG - 2023-05-19 02:04:33,895 - admin - _request - lineno:332]: Dispatching get http://docker-rp-23:9644/v1/brokers/1                                                                                                       
INFO  2023-05-19 02:04:33,895 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:187 - [leadership_transfer] Stepping down as leader in term 1, dirty offset 21                                 
DEBUG 2023-05-19 02:04:33,895 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2618 - triggering leadership notification with term: 1, new leader: {nullopt}                                  
DEBUG 2023-05-19 02:04:33,895 [shard 0] cluster - health_monitor_backend.cc:395 - aborting current refresh request to 1                                                                                          
DEBUG 2023-05-19 02:04:33,895 [shard 0] cluster - feature_manager.cc:117 - Controller leader notification term 1                                                                                                 
TRACE 2023-05-19 02:04:33,895 [shard 0] cluster - partition_leaders_table.cc:160 - updated partition: {redpanda/controller/0} leader: {term: 1, current leader: {nullopt}, previous leader: {1}, revision: 0}    
INFO  2023-05-19 02:04:33,895 [shard 0] cluster - drain_manager.cc:239 - Draining leadership from 1 groups 1 succeeded                                                                                           
INFO  2023-05-19 02:04:33,895 [shard 0] cluster - drain_manager.cc:256 - Node draining has completed on shard 0                                                                                                  
TRACE 2023-05-19 02:04:33,895 [shard 1] cluster - partition_leaders_table.cc:160 - updated partition: {redpanda/controller/0} leader: {term: 1, current leader: {nullopt}, previous leader: {1}, revision: 0}    
TRACE 2023-05-19 02:04:33,898 [shard 0] request_auth - request_auth.cc:126 - Authenticated user {admin}                                                                                                          
DEBUG 2023-05-19 02:04:33,898 [shard 0] admin_api_server - admin_server.cc:365 - [admin] GET http://docker-rp-23:9644/v1/brokers/1                                                                               
[DEBUG - 2023-05-19 02:04:33,899 - admin - _request - lineno:355]: Response OK, JSON: {                 
    'node_id': 1,                       
    'num_cores': 2,                     
    'membership_status': 'active',      
    'maintenance_status': {             
        'draining': False,              
        'finished': False,              
        'errors': False,                
        'partitions': 0,                
        'eligible': 0,                  
        'transferring': 0,              
        'failed': 0                     
    }                                   
}                                       

Whereas

        def has_drained_leaders(node):
            try:
                node_id = self.redpanda.idx(node)
                broker_resp = admin.get_broker(node_id, node=node)
                maintenance_status = broker_resp["maintenance_status"]
                return maintenance_status["draining"] and maintenance_status["finished"]

So it looks like node draining has finished before the test gets the draining status for the 1st time. The test wants to see the draining status before moving forward but misses it, and timeouts.

dlex avatar May 20 '23 02:05 dlex

on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29360#01882d69-f998-49ce-a0c8-4f6bde5a6463

dlex avatar May 22 '23 04:05 dlex

Sounds like more a test failure and hence sev/low to me. @michael-redpanda could you keep me honest, please?

piyushredpanda avatar Jun 22 '23 06:06 piyushredpanda

https://buildkite.com/redpanda/redpanda/builds/37651#018ad049-05dd-4d29-a1b5-b6fc8bba64c4

bharathv avatar Sep 27 '23 02:09 bharathv

reported as stale by pandatriage

michael-redpanda avatar Oct 30 '23 17:10 michael-redpanda

*https://buildkite.com/redpanda/redpanda/builds/40369

vbotbuildovich avatar Nov 03 '23 16:11 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/10446

vbotbuildovich avatar Nov 05 '23 00:11 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11146

vbotbuildovich avatar Dec 15 '23 01:12 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11190

vbotbuildovich avatar Dec 19 '23 00:12 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/43078

vbotbuildovich avatar Dec 21 '23 00:12 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11257

vbotbuildovich avatar Dec 23 '23 00:12 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/44935

vbotbuildovich avatar Feb 13 '24 20:02 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/46544

vbotbuildovich avatar Mar 21 '24 21:03 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/12656

vbotbuildovich avatar Apr 03 '24 21:04 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13139

vbotbuildovich avatar Apr 24 '24 21:04 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13519

vbotbuildovich avatar May 07 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13656

vbotbuildovich avatar May 12 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/49573

vbotbuildovich avatar May 27 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich avatar May 28 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13139 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13656 *https://buildkite.com/redpanda/redpanda/builds/49573 *https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich avatar Jun 11 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13139 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13656 *https://buildkite.com/redpanda/redpanda/builds/49573 *https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich avatar Jun 11 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13139 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13656 *https://buildkite.com/redpanda/redpanda/builds/49573 *https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich avatar Jun 12 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/51106

vbotbuildovich avatar Jul 04 '24 23:07 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/51106

vbotbuildovich avatar Jul 05 '24 00:07 vbotbuildovich

*https://buildkite.com/redpanda/redpanda/builds/51183

vbotbuildovich avatar Jul 06 '24 21:07 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/15844

vbotbuildovich avatar Jul 22 '24 21:07 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/16241

vbotbuildovich avatar Aug 07 '24 21:08 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/16399

vbotbuildovich avatar Aug 10 '24 21:08 vbotbuildovich