redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (key symptom) in `RollingRestartTest.test_rolling_restart`

Open vbotbuildovich opened this issue 9 months ago • 0 comments

https://buildkite.com/redpanda/vtools/builds/13738

Module: rptest.redpanda_cloud_tests.rolling_restart_test
Class: RollingRestartTest
Method: test_rolling_restart
test_id:    RollingRestartTest.test_rolling_restart
status:     FAIL
run time:   1423.022 seconds

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cp24867ek221n77ef6tg-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json'], '', '\x1b[31mERROR: \x1b[0mfailed connecting to host cp24867ek221n77ef6tg-agent:0: failed to receive cluster details response\n\tfailed to dial target host\n\tTeleport proxy failed to connect to "node" agent "@local-node" over reverse tunnel:\n\n  ssh: unexpected packet in response to channel open: <nil>\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster.\n\n')
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 197, in _local_cmd
    s_out, s_err = process.communicate(timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 1154, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.10/subprocess.py", line 2022, in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
  File "/usr/lib/python3.10/subprocess.py", line 1198, in _check_timeout
    raise TimeoutExpired(
subprocess.TimeoutExpired: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cp24867ek221n77ef6tg-agent', 'kubectl', 'delete', 'pod', 'rp-cp24867ek221n77ef6tg-3', '-n=redpanda']' timed out after 900 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/rolling_restart_test.py", line 35, in test_rolling_restart
    self.redpanda.rolling_restart_pods()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1778, in rolling_restart_pods
    self.restart_pod(pod_name, pod_timeout)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1749, in restart_pod
    self.kubectl.cmd(delete_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 256, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 232, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 205, in _local_cmd
    raise subprocess.TimeoutExpired(cmd, timeout, s_out, s_err)
subprocess.TimeoutExpired: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cp24867ek221n77ef6tg-agent', 'kubectl', 'delete', 'pod', 'rp-cp24867ek221n77ef6tg-3', '-n=redpanda']' timed out after 900 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 126, in wrapped
    redpanda.raise_on_crash(log_allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2084, in raise_on_crash
    active, _, _ = self.get_redpanda_pods_presorted()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1669, in get_redpanda_pods_presorted
    all_pods = self.get_redpanda_pods()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1697, in get_redpanda_pods
    pods = json.loads(self.kubectl.cmd('get pods -n redpanda -o json'))
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 256, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 232, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 215, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cp24867ek221n77ef6tg-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json']' returned non-zero exit status 1.

JIRA Link: CORE-2975

vbotbuildovich avatar May 15 '24 21:05 vbotbuildovich