wazuh-qa
wazuh-qa copied to clipboard
DTT1 - Iteration 3 - Allocation module - Handling SSH connection errors
Improve error handling for SSH connection problems when executing remote command deployment for macStadium.
Some errors:
python3 deployability/modules/allocation/main.py --action create --provider vagrant --size large --composite-name macos-highsierra-10.13.6-amd64 --working-dir /tmp/allocatorvm --track-output /tmp/allocatorvm/track.yml --inventory-output /tmp/allocatorvm/inventory.yml --instance-name gha_8941573697_build
[2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi not available, cannot use any GSSAPIProxy protocols: No module named 'gssapi'
[2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi IOV extension not available: No module named 'gssapi'
[[20](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:21)24-05-03 15:52:35] [INFO] ALLOCATOR: Creating instance at /tmp/allocatorvm
[2024-05-03 15:52:35] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-03 15:52:41] [INFO] ALLOCATOR: Using the macStadium Intel server to deploy.
[2024-05-03 15:52:43] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-03 15:52:43] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-03 15:52:53] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
Error: 024-05-03 15:53:23] [ERROR] ALLOCATOR: Command failed: Connection reset by 10.10.0.249 port 22
[2024-05-03 15:53:23] [INFO] ALLOCATOR: Instance gha_8941573697_build created.
Error: 024-05-03 15:55:37] [ERROR] ALLOCATOR: Command failed: ssh: connect to host 10.10.0.249 port 22: Connection timed out
[2024-05-03 15:55:37] [INFO] ALLOCATOR: Instance gha_8941573697_build started.
Error: 024-05-03 15:55:39] [ERROR] ALLOCATOR: Command failed: sudo: /Users/jenkins/testing/gha_8941573697_build/vagrant_script.sh: command not found
Traceback (most recent call last):
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 39, in <module>
main()
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 35, in main
Allocator.run(InputPayload(**vars(parse_arguments())))
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 37, in run
return cls.__create(payload)
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 63, in __create
inventory = cls.__generate_inventory(instance, payload.inventory_output)
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 130, in __generate_inventory
ssh_config = instance.ssh_connection_info()
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 142, in ssh_connection_info
if not 'running' in self.status():
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 1[21](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:22), in status
return self.__parse_vagrant_status(output)
File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line [23](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:24)8, in __parse_vagrant_status
lines = message.split('\n')
AttributeError: 'NoneType' object has no attribute 'split'
Test
A test is performed forcing the VPN to disconnect and generating a broken pipe, the script is recovered after reconnection and the deployment continues:
cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-07 16:15:49] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-07 16:15:55] [INFO] ALLOCATOR: macStadium server has less than 2 VMs running, deploying in this host.
[2024-05-07 16:15:55] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-07 16:15:58] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-07 16:16:02] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-07 16:16:02] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-07 16:16:05] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-07 16:16:10] [INFO] ALLOCATOR: Instance cbordon-test-ssh-5543 created.
[2024-05-07 16:18:12] [WARNING] ALLOCATOR: SSH connection error: client_loop: send disconnect: Broken pipe
. Retrying...
[2024-05-07 16:18:47] [INFO] ALLOCATOR: Instance cbordon-test-ssh-5543 started.
[2024-05-07 16:19:06] [INFO] ALLOCATOR: Inventory file generated at /tmp/wazuh-qa/cbordon-test-ssh-5543/inventory.yml
[2024-05-07 16:19:08] [INFO] ALLOCATOR: SSH connection successful.
[2024-05-07 16:19:18] [INFO] ALLOCATOR: Track file generated at /tmp/wazuh-qa/cbordon-test-ssh-5543/track.yml
Update report
A couple of changes are made to improve error handling in the SSH connection, here is a list of improvements that this branch includes:
- Reduction in remote command execution times after a VPN crash, previously if the VPN crashes the process could be stuck for a long time, this timeout is reduced and retries are made to try to reestablish the connection.
cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-08 16:35:24] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-08 16:35:30] [INFO] ALLOCATOR: macStadium ARM server has less than 2 VMs running, deploying in this host.
[2024-05-08 16:35:30] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-08 16:35:33] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-08 16:35:35] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-08 16:35:35] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-08 16:35:39] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-08 16:35:50] [INFO] ALLOCATOR: Instance cbordon-test-ssh-1018 created.
[2024-05-08 16:41:18] [WARNING] ALLOCATOR: SSH connection error: . Retrying in 30 seconds...
[2024-05-08 16:44:03] [WARNING] ALLOCATOR: SSH connection error: [Errno 110] Connection timed out. Retrying in 30 seconds...
Traceback (most recent call last):
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/utils.py", line 45, in remote_command
ssh.connect(**ssh_parameters)
File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349, in connect
retry_on_signal(lambda: sock.connect(addr))
File "/usr/lib/python3/dist-packages/paramiko/util.py", line 279, in retry_on_signal
return function()
File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349, in <lambda>
retry_on_signal(lambda: sock.connect(addr))
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/main.py", line 39, in <module>
main()
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/main.py", line 35, in main
Allocator.run(InputPayload(**vars(parse_arguments())))
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/allocation.py", line 37, in run
return cls.__create(payload)
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/allocation.py", line 60, in __create
instance.start()
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/instance.py", line 69, in start
self.__run_vagrant_command('up')
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/instance.py", line 221, in __run_vagrant_command
output = VagrantUtils.remote_command(cmd, self.remote_host_parameters)
File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/utils.py", line 58, in remote_command
raise ValueError(f"Remote command execution failed: {str(e)}")
ValueError: Remote command execution failed: [Errno 110] Connection timed out
- Retries in the event of a possible VPN failure, give us the possibility of continuing with a process that had been left pending:
cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-08 16:26:46] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-08 16:26:51] [INFO] ALLOCATOR: macStadium ARM server has less than 2 VMs running, deploying in this host.
[2024-05-08 16:26:51] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-08 16:26:54] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-08 16:26:57] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-08 16:26:57] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-08 16:27:00] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-08 16:27:11] [INFO] ALLOCATOR: Instance cbordon-test-ssh-7111 created.
[2024-05-08 16:32:16] [WARNING] ALLOCATOR: SSH connection error: . Retrying in 30 seconds...
[2024-05-08 16:32:49] [ERROR] PARAMIKO.TRANSPORT: Socket exception: Connection reset by peer (104)
[2024-05-08 16:32:51] [INFO] ALLOCATOR: Instance cbordon-test-ssh-7111 started.
[2024-05-08 16:33:05] [INFO] ALLOCATOR: Inventory file generated at /tmp/wazuh-qa/cbordon-test-ssh-7111/inventory.yml
[2024-05-08 16:33:07] [INFO] ALLOCATOR: SSH connection successful.
[2024-05-08 16:33:16] [INFO] ALLOCATOR: Track file generated at /tmp/wazuh-qa/cbordon-test-ssh-7111/track.yml
- Improvement in the assignment of the port to be published, before the validation was sequential which could generate some concurrency problems, a random method was added so that the assignment has much less possibility of collision in the ports
- Improvement in the management of vagrant status. Previously, if the command failed and a None message was generated, it was not interpreted and the script ended with an error because it could not split this message.