warpgate SSH server dies when creating lots of connections

Hello,

We've been using WarpGate happily, but we're facing an issue:

We use Ansible
We have an inventory with 14 hosts
When running a (fairly long) playbook, at some point, the SSH server dies, The web server still works, but the SSH server stops listening to the port (running ss -tulpn doesn't show it)
It keeps the opened connections alive, but is unable to open new connections
After a restart of the server, it works again

It seems to be related to the number of connections or high opening / closing of connections.

How can we further debug this?

Thanks!!

Néfix Estrada IsardVDI

======

Further looking, there are lots of this error:

ERROR warpgate_core::logging::database: Failed to store log entry error=Exec(SqlxError(Database(SqliteError { code: 14, message: "unable to open database file" })))

Nov 12 '24 13:11 NefixEstrada

What OS are you running (EX, RHEL w/ SELinux enabled)?

ERROR warpgate_core::logging::database: Failed to store log entry error=Exec(SqlxError(Database(SqliteError { code: 14, message: "unable to open database file" })))

Are you seeing this error after you restart the warpgate server?

Nov 12 '24 17:11 codyro

@codyro We're running Ubuntu 22.04

This error appears only after a while, when the server is under load. Restarting the server fixes the issue temporarily, but it comes back when there's load again

Nov 12 '24 17:11 NefixEstrada

I was able to test this (albeit on RHEL9) but couldn't replicate it (although there appears to be another issue I faced that may be related to something else—I need to debug it a bit more and submit an issue if necessary).

Here is what I did:

Steps to reproduce

Spin up 20VMs
Setup Ansible with some debug tasks/roles and an inventory for test instances
Ran ansible-playbook with various forks (1,5,10,20) with the default strategy [1]

Playbook(s) used to test

I ran a small playbook to see if everything ran cleanly and quickly. That worked without issues. Since you noted that your playbooks are relatively long, I added another role. This was when I ran into various issues; however, none of them were what you experienced. warpgate continued to listen normally and accepted new connections without a problem.

---
- name: Test warpgate
  hosts: all
  gather_facts: true
  vars:
    ansible_user: "codyr:{{ inventory_hostname }}"
    ansible_host: warpgate.host.com
    ansible_port: 2222
  tasks:
    - name: Print hostname
      ansible.builtin.debug:
        var: ansible_hostname

    - name: Add sshkey to root user
      ansible.posix.authorized_key:
        user: root
        state: present
        key: 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGM0TQe1trzJ4VEsKRhURyJJ7wsr/9UAY2JJdUhaPZfA  cody@test'
  ## Make the playbook run longer
  # roles:
  #   - role: geerlingguy.mysql

What version of ansible & warpgate are you running? I was also using the sqlite backend and did not see the errors you were receiving.

codyr@wp ~ [2]> podman exec -it systemd-warpgate warpgate --version
warpgate 0.11.0

(hawkansible-venv) codyr@Portia ~/D/g/t/s/w/ansible-warpgate (warpgate-testing)> ansible --version
ansible [core 2.17.5]

Nov 20 '24 15:11 codyro