redpanda
                                
                                
                                
                                    redpanda copied to clipboard
                            
                            
                            
                        CI Failure (key symptom) in `RedpandaUpgradeTest.test_workloads_through_releases`
https://buildkite.com/redpanda/vtools/builds/13676
Module: rptest.tests.workload_upgrade_runner_test
Class: RedpandaUpgradeTest
Method: test_workloads_through_releases
Arguments: {
    "cloud_storage_type": 1
}
test_id:    RedpandaUpgradeTest.test_workloads_through_releases
status:     FAIL
run time:   537.213 seconds
RemoteCommandError({'ssh_config': {'host': 'ip-172-31-11-46', 'hostname': '172.31.11.46', 'user': 'root', 'port': 22, 'password': None, 'identityfile': '/home/ubuntu/.ssh/id_rsa'}, 'hostname': 'ip-172-31-11-46', 'ssh_hostname': '172.31.11.46', 'user': 'root', 'externally_routable_ip': '35.162.166.49', '_logger': <Logger rptest.tests.workload_upgrade_runner_test.RedpandaUpgradeTest.test_workloads_through_releases.cloud_storage_type=CloudStorageType.S3-820 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0x7fd8a5a22710>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0x7fd8a5a4f100>, '_custom_ssh_exception_checks': None}, 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz', 35, b'')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/workload_upgrade_runner_test.py", line 278, in test_workloads_through_releases
    for current_version in self.upgrade_through_versions(
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 247, in upgrade_through_versions
    current_version = install_next()
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 174, in install_next
    self.redpanda._installer.install(self.redpanda.nodes, v)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 609, in install
    self._install_unlocked(nodes, install_target)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 658, in _install_unlocked
    raise e
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 638, in _install_unlocked
    self.wait_for_async_ssh(self._redpanda.logger,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 165, in wait_for_async_ssh
    for l in ssh_out_per_node[node]:
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 687, in next
    return next(self.iter_obj)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 354, in output_generator
    raise RemoteCommandError(self, cmd, exit_status, stderr.read())
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-11-46: Command 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz' returned non-zero exit status 35.
JIRA Link: CORE-2940
This one doesn't look like infra issue.. Originally that is what I thought, since it shows ssh issue and timeout of 20 seconds.
But I looked at this test code and the logic was not changed for almost two years.
It is related to upgrade, we wait for service to come up
Error message is:
ducktape.errors.TimeoutError: Redpanda service ip-172-31-6-14 failed to start within 20 sec
We could increase a timeout and see what happens, but since this code is not new, looks like there is a possible degradation with the product.. I will assign it to myslef and investigate more. Let's see if we can reproduce it
Duplicate of https://github.com/redpanda-data/redpanda/issues/13306
*https://buildkite.com/redpanda/vtools/builds/13866 *https://buildkite.com/redpanda/vtools/builds/14212
*https://buildkite.com/redpanda/vtools/builds/14267 *https://buildkite.com/redpanda/vtools/builds/14280
*https://buildkite.com/redpanda/vtools/builds/14463
*https://buildkite.com/redpanda/vtools/builds/14463
@rpdevmp wrote:
Error message is: ducktape.errors.TimeoutError: Redpanda service ip-172-31-6-14 failed to start within 20 sec
... but the error message from the stack in the top comment is:
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-11-46: Command 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz' returned non-zero exit status 35.
This clearly looks like curl failing to connect to the s3 bucket (well it's hard to tell for sure because of the long pipeline but based on errors we see elsewhere the 35 is almost certain from curl).
So I don't understand the comment about Redpanda failing to start. Where do you see that?
I suspect these are duplicates of https://github.com/redpanda-data/redpanda/issues/18607. We are using curl here rather than requests, but the endpoint is the same and curl error 35 is an SSL-related error just like we got in Python.
*https://buildkite.com/redpanda/vtools/builds/15156
*https://buildkite.com/redpanda/vtools/builds/15156
Automatically closing issue to match current state of CORE-2940