redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

TimeoutError in `PartitionBalancerScaleTest.test_node_operations_at_scale`

Open NyaliaLui opened this issue 2 years ago • 4 comments

Version & Environment

Redpanda version: dev on CDT nightly:

https://buildkite.com/redpanda/vtools/builds/4182#01845f80-c176-4aa4-b55c-816b614deec6/6-8067

   TimeoutError('')
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/partition_balancer_scale_test.py", line 278, in test_node_operations_at_scale
    wait_until(partitions_moved_to_new_node, timeout, 5)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

NyaliaLui avatar Nov 10 '22 15:11 NyaliaLui

Another but for type=big_partitions https://buildkite.com/redpanda/vtools/builds/4182#01845f80-c176-4aa4-b55c-816b614deec6/6-7958

NyaliaLui avatar Nov 10 '22 15:11 NyaliaLui

https://buildkite.com/redpanda/vtools/builds/4201#018463f2-bed8-437b-af13-fa9cc677ecaf/6-7516 https://buildkite.com/redpanda/vtools/builds/4201#018463f2-bed8-437b-af13-fa9cc677ecaf/6-7494 https://buildkite.com/redpanda/vtools/builds/4196#01846250-d53f-473f-83be-eb6536cfa36f/6-7482 https://buildkite.com/redpanda/vtools/builds/4196#01846250-d53f-473f-83be-eb6536cfa36f/6-7460

NyaliaLui avatar Nov 11 '22 15:11 NyaliaLui

This build has the occurrence of this failure as well: https://buildkite.com/redpanda/vtools/builds/4269#01847bb7-c885-4b7c-bfd2-857879309e1d

rishabh96b avatar Nov 16 '22 05:11 rishabh96b

another two from cdt: 14 FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=big_partitions (1/1 runs) 15 failure at 2022-11-18T14:42:22.575Z: TimeoutError('') 16 in job https://buildkite.com/redpanda/vtools/builds/4312#018489c4-806d-40af-b364-fd8e3cd49a22 17 18 FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=many_partitions (1/1 runs) 19 failure at 2022-11-18T14:42:22.575Z: TimeoutError('') 20 in job https://buildkite.com/redpanda/vtools/builds/4312#018489c4-806d-40af-b364-fd8e3cd49a22

andijcr avatar Nov 18 '22 15:11 andijcr

FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=big_partitions (4/4 runs) failure at 2022-11-23T14:46:52.114Z: TimeoutError('') on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/4361#0184a383-6120-4cfe-849f-20fd922fc70f failure at 2022-11-22T15:12:22.822Z: TimeoutError('') on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/4345#01849e5e-c25f-42d3-9369-b80ed245db57

jcsp avatar Nov 23 '22 16:11 jcsp

Continues to fail in nightly CDT amd64: https://buildkite.com/redpanda/vtools/builds/4388#0184a8aa-0aae-49b9-8b42-bb218d98cce4

jcsp avatar Nov 25 '22 09:11 jcsp

i know, working on a fix

mmaslankaprv avatar Nov 25 '22 11:11 mmaslankaprv

Again on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6598#0186c040-c711-4340-8607-9a50d99e9457

dlex avatar Mar 09 '23 01:03 dlex

Again on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6628#0186c565-cedc-4dae-8ae5-4f4c9aa4908f

dlex avatar Mar 10 '23 06:03 dlex

FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=many_partitions (1/1 runs)                                                                                                                                                           
failure at 2023-03-20T16:02:20.498Z: TimeoutError('')                                                                                                                                                                                                       
on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6783#0186fe0f-9711-45d3-939e-c37f01c486c3

VladLazar avatar Mar 21 '23 12:03 VladLazar

https://buildkite.com/redpanda/vtools/builds/6874#01871a62-cf38-4df6-93cd-a31411c0fa62

Module: rptest.scale_tests.partition_balancer_scale_test
Class:  PartitionBalancerScaleTest
Method: test_node_operations_at_scale
Arguments:
{
  "type": "big_partitions"
}
test_id:    rptest.scale_tests.partition_balancer_scale_test.PartitionBalancerScaleTest.test_node_operations_at_scale.type=big_partitions
status:     FAIL
run time:   12 minutes 10.008 seconds

    TimeoutError('')
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/partition_balancer_scale_test.py", line 280, in test_node_operations_at_scale
    wait_until(partitions_moved_to_new_node, timeout, 5)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

rystsov avatar Mar 27 '23 22:03 rystsov

  1. https://buildkite.com/redpanda/vtools/builds/6936#01873a54-1d8a-4e3e-b4bd-7dc2f6a0dd2e
256 FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=big_partitions (1/2 runs)
257   failure at 2023-04-01T08:52:09.156Z: TimeoutError('')
258       on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6936#01873a54-1d8a-4e3e-b4bd-7dc2f6a0dd2e

test_id:    rptest.scale_tests.partition_balancer_scale_test.PartitionBalancerScaleTest.test_node_operations_at_scale.type=big_partitions
status:     FAIL
run time:   14 minutes 3.956 seconds


    TimeoutError('')
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/partition_balancer_scale_test.py", line 278, in test_node_operations_at_scale
    wait_until(all_reconfigurations_done, timeout, 5)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError
  1. https://buildkite.com/redpanda/vtools/builds/6944#018740ff-b761-4948-8d65-bb95c274a4f4
 FAIL test: PartitionBalancerScaleTest.test_node_operations_at_scale.type=many_partitions (1/2 runs)
261   failure at 2023-04-02T16:16:24.377Z: TimeoutError('')
262       on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/6944#018740ff-b761-4948-8d65-bb95c274a4f4

test_id:    rptest.scale_tests.partition_balancer_scale_test.PartitionBalancerScaleTest.test_node_operations_at_scale.type=many_partitions
status:     FAIL
run time:   15 minutes 34.593 seconds


    TimeoutError('')
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/partition_balancer_scale_test.py", line 280, in test_node_operations_at_scale
    wait_until(partitions_moved_to_new_node, timeout, 5)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

andijcr avatar Apr 03 '23 09:04 andijcr

https://buildkite.com/redpanda/vtools/builds/7369#0187e0a5-2d6d-43cf-9c6f-abb06aa07a27 .. another variant..

test_id:    rptest.scale_tests.partition_balancer_scale_test.PartitionBalancerScaleTest.test_partition_balancer_with_many_partitions.type=many_partitions
status:     FAIL
run time:   10 minutes 31.802 seconds


    TimeoutError('')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/partition_balancer_scale_test.py", line 159, in test_partition_balancer_with_many_partitions
    wait_until(stopped_node_is_empty, timeout, 5)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

bharathv avatar May 03 '23 19:05 bharathv

https://buildkite.com/redpanda/vtools/builds/7919#01888802-8884-46fd-ae0e-980a77943ce5

andijcr avatar Jun 05 '23 11:06 andijcr

https://buildkite.com/redpanda/vtools/builds/8068#0188b3ca-909c-4892-85ab-722d7087cbd5

ztlpn avatar Jun 14 '23 10:06 ztlpn

I'll chase this

ztlpn avatar Jun 14 '23 10:06 ztlpn

https://buildkite.com/redpanda/vtools/builds/8114#0188c00a-baf8-4b4a-a04c-ab7b2e7dc2d9

vshtokman avatar Jun 16 '23 14:06 vshtokman

https://buildkite.com/redpanda/vtools/builds/8122#0188c33b-58db-45ef-a128-9325b9c2db58

ztlpn avatar Jun 16 '23 20:06 ztlpn

failures stopped after 2023-06-17, closing

ztlpn avatar Jul 19 '23 18:07 ztlpn