SSH server dies when creating lots of connections
Hello,
We've been using WarpGate happily, but we're facing an issue:
- We use Ansible
- We have an inventory with 14 hosts
- When running a (fairly long) playbook, at some point, the SSH server dies, The web server still works, but the SSH server stops listening to the port (running
ss -tulpndoesn't show it) - It keeps the opened connections alive, but is unable to open new connections
- After a restart of the server, it works again
It seems to be related to the number of connections or high opening / closing of connections.
How can we further debug this?
Thanks!!
Néfix Estrada IsardVDI
======
Further looking, there are lots of this error:
ERROR warpgate_core::logging::database: Failed to store log entry error=Exec(SqlxError(Database(SqliteError { code: 14, message: "unable to open database file" })))
What OS are you running (EX, RHEL w/ SELinux enabled)?
ERROR warpgate_core::logging::database: Failed to store log entry error=Exec(SqlxError(Database(SqliteError { code: 14, message: "unable to open database file" })))
Are you seeing this error after you restart the warpgate server?
@codyro We're running Ubuntu 22.04
This error appears only after a while, when the server is under load. Restarting the server fixes the issue temporarily, but it comes back when there's load again
I was able to test this (albeit on RHEL9) but couldn't replicate it (although there appears to be another issue I faced that may be related to something else—I need to debug it a bit more and submit an issue if necessary).
Here is what I did:
Steps to reproduce
- Spin up 20VMs
- Setup Ansible with some debug tasks/roles and an inventory for test instances
- Ran
ansible-playbookwith various forks (1,5,10,20) with the default strategy [1]
Playbook(s) used to test
I ran a small playbook to see if everything ran cleanly and quickly. That worked without issues. Since you noted that your playbooks are relatively long, I added another role. This was when I ran into various issues; however, none of them were what you experienced. warpgate continued to listen normally and accepted new connections without a problem.
---
- name: Test warpgate
hosts: all
gather_facts: true
vars:
ansible_user: "codyr:{{ inventory_hostname }}"
ansible_host: warpgate.host.com
ansible_port: 2222
tasks:
- name: Print hostname
ansible.builtin.debug:
var: ansible_hostname
- name: Add sshkey to root user
ansible.posix.authorized_key:
user: root
state: present
key: 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGM0TQe1trzJ4VEsKRhURyJJ7wsr/9UAY2JJdUhaPZfA cody@test'
## Make the playbook run longer
# roles:
# - role: geerlingguy.mysql
What version of ansible & warpgate are you running? I was also using the sqlite backend and did not see the errors you were receiving.
codyr@wp ~ [2]> podman exec -it systemd-warpgate warpgate --version
warpgate 0.11.0
(hawkansible-venv) codyr@Portia ~/D/g/t/s/w/ansible-warpgate (warpgate-testing)> ansible --version
ansible [core 2.17.5]