redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (key symptom) in `RedpandaUpgradeTest.test_workloads_through_releases`

Open vbotbuildovich opened this issue 1 year ago • 1 comments

https://buildkite.com/redpanda/vtools/builds/13676

Module: rptest.tests.workload_upgrade_runner_test
Class: RedpandaUpgradeTest
Method: test_workloads_through_releases
Arguments: {
    "cloud_storage_type": 1
}
test_id:    RedpandaUpgradeTest.test_workloads_through_releases
status:     FAIL
run time:   537.213 seconds

RemoteCommandError({'ssh_config': {'host': 'ip-172-31-11-46', 'hostname': '172.31.11.46', 'user': 'root', 'port': 22, 'password': None, 'identityfile': '/home/ubuntu/.ssh/id_rsa'}, 'hostname': 'ip-172-31-11-46', 'ssh_hostname': '172.31.11.46', 'user': 'root', 'externally_routable_ip': '35.162.166.49', '_logger': <Logger rptest.tests.workload_upgrade_runner_test.RedpandaUpgradeTest.test_workloads_through_releases.cloud_storage_type=CloudStorageType.S3-820 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0x7fd8a5a22710>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0x7fd8a5a4f100>, '_custom_ssh_exception_checks': None}, 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz', 35, b'')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/workload_upgrade_runner_test.py", line 278, in test_workloads_through_releases
    for current_version in self.upgrade_through_versions(
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 247, in upgrade_through_versions
    current_version = install_next()
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 174, in install_next
    self.redpanda._installer.install(self.redpanda.nodes, v)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 609, in install
    self._install_unlocked(nodes, install_target)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 658, in _install_unlocked
    raise e
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 638, in _install_unlocked
    self.wait_for_async_ssh(self._redpanda.logger,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 165, in wait_for_async_ssh
    for l in ssh_out_per_node[node]:
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 687, in next
    return next(self.iter_obj)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 354, in output_generator
    raise RemoteCommandError(self, cmd, exit_status, stderr.read())
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-11-46: Command 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz' returned non-zero exit status 35.

JIRA Link: CORE-2940

vbotbuildovich avatar May 13 '24 21:05 vbotbuildovich

This one doesn't look like infra issue.. Originally that is what I thought, since it shows ssh issue and timeout of 20 seconds.

But I looked at this test code and the logic was not changed for almost two years.

It is related to upgrade, we wait for service to come up

Error message is: ducktape.errors.TimeoutError: Redpanda service ip-172-31-6-14 failed to start within 20 sec

We could increase a timeout and see what happens, but since this code is not new, looks like there is a possible degradation with the product.. I will assign it to myslef and investigate more. Let's see if we can reproduce it

rpdevmp avatar May 14 '24 04:05 rpdevmp

Duplicate of https://github.com/redpanda-data/redpanda/issues/13306

rpdevmp avatar May 16 '24 20:05 rpdevmp

*https://buildkite.com/redpanda/vtools/builds/13866 *https://buildkite.com/redpanda/vtools/builds/14212

vbotbuildovich avatar May 31 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/14267 *https://buildkite.com/redpanda/vtools/builds/14280

vbotbuildovich avatar Jun 05 '24 20:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/14463

vbotbuildovich avatar Jun 06 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/14463

vbotbuildovich avatar Jun 11 '24 20:06 vbotbuildovich

@rpdevmp wrote:

Error message is: ducktape.errors.TimeoutError: Redpanda service ip-172-31-6-14 failed to start within 20 sec

... but the error message from the stack in the top comment is:

ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-11-46: Command 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz' returned non-zero exit status 35.

This clearly looks like curl failing to connect to the s3 bucket (well it's hard to tell for sure because of the long pipeline but based on errors we see elsewhere the 35 is almost certain from curl).

So I don't understand the comment about Redpanda failing to start. Where do you see that?

I suspect these are duplicates of https://github.com/redpanda-data/redpanda/issues/18607. We are using curl here rather than requests, but the endpoint is the same and curl error 35 is an SSL-related error just like we got in Python.

travisdowns avatar Jun 23 '24 05:06 travisdowns

*https://buildkite.com/redpanda/vtools/builds/15156

vbotbuildovich avatar Jul 04 '24 23:07 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/15156

vbotbuildovich avatar Jul 05 '24 00:07 vbotbuildovich

Automatically closing issue to match current state of CORE-2940

michael-redpanda avatar Jul 26 '24 03:07 michael-redpanda