SSLproxy exits with error after some time, fd leak?
I see SSLproxy (bd43af7a1017fb9ba509e339003c39106cd14004) run well for a while, when exit with error. Messages suggest it runs out of file descriptors suggesting they get leaked somewhere.
sudo ./src/sslproxy -D -l connections.log -j /tmp/sslsplit/ -S logdir/ -k ../sslsplit/ca.key -c ../sslsplit/ca.crt -Y pcaps/ -oVerifyPeer=no autossl 192.168.0.1 8465 up:8080 autossl 192.168.0.1 8143 up:8080 autossl 192.168.0.1 8110 up:8080
[...]
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
Error 24 on listener: Too many open files
SSL_free() in state 00000001 = 0001 = SSLOK (SSL negotiation finished successfully) [accept socket]
SSL_free() in state 00000001 = 0001 = SSLOK (SSL negotiation finished successfully) [connect socket]
Garbage collecting caches started.
Garbage collecting caches done.
Connecting to [142.250.102.109]:995
CONN: autossl 192.168.0.5 36392 142.250.102.109 995 user:-
AUTOSSL connected to [142.250.102.109]:995
AUTOSSL connected from [192.168.0.5]:36392
Received privsep req type 01 sz 94 on srvsock 14
Checking for a client hello
Peek found ClientHello
Received privsep req type 01 sz 94 on srvsock 16
Attempt reuse dst SSL session
SSL_free() in state 00000001 = 0001 = SSLOK (SSL negotiation finished successfully) [connect socket]
Garbage collecting caches started.
Garbage collecting caches done.
Connecting to [142.250.102.108]:995
CONN: autossl 192.168.0.5 42568 142.250.102.108 995 user:-
AUTOSSL connected to [142.250.102.108]:995
AUTOSSL connected from [192.168.0.5]:42568
Received privsep req type 01 sz 94 on srvsock 14
Checking for a client hello
Peek found ClientHello
Received privsep req type 01 sz 94 on srvsock 16
Attempt reuse dst SSL session
Opening pcapdir file '/home/victor/build/SSLproxy/pcaps/20251004T152451Z-192.168.0.5,42568-142.250.102.108,995.pcap' failed: Success (0)
Error 0 on logger: Success
Main event loop stopped (reason=-1).
Error building tcp header: (null)Error building packet
Failed to write to pcap log: Success (0)
Error 0 on logger: Success
Received privsep req type 00 sz 1 on srvsock 8
Received privsep req type 00 sz 1 on srvsock 14
Received privsep req type 00 sz 1 on srvsock 12
Received privsep req type 00 sz 1 on srvsock 16
Child pid 75442 exited with status 1
The error message strongly suggests resource exhaustion (running out of available FDs) rather than a simple crash. We try to mitigate this specific type of crash during connection acceptance by comparing the descriptor table count against the available limit, as detailed in this TODO task. Note the difficulties I specifically mention there regarding the logging subsystem.
I suspect the crash you saw is possible with the -Y option, especially if SSLproxy was already very close to the system's open files limit (1024 by default). I've frankly never run the -Y option under actual production load. We clearly need to implement the same resource check mechanism in the logging subsystem that the proxy core uses.
While we've used static analysis tools and Valgrind, I wouldn't entirely rule out the possibility of a genuine fd leak as you suspect, particularly in the logging subsystem.
To help narrow this down, could you provide some context on the crash?
- Load: Was SSLproxy under heavy load (too many connections relative to the fd limit) when this happened?
- Limit: Do you know if SSLproxy was already very close to the fd limit?
- Reproducibility: Does this only happen when the -Y option is enabled?
For better diagnostics, I'd appreciate it if you could:
- Enable the
DEBUG_PROXYswitch inMk/main.mk, recompile, and start SSLproxy with the -D4 option. This verbose logging should include current fd usage too. - Provide a coredump, if available (but sslproxy simply exits here, doesn't it?).
In the meantime, I'll review the logging subsystem source code yet again to see if we can prevent these fd-related crashes, similar to the proxy core logic. Until we find a definitive fix, please increase the fd limit on your system using the OpenFilesLimit option.
Thanks for reporting.
@victorjulien I have just pushed a change on the check_fd_usage_before_content_logging branch, which is expected to prevent the issue you have reported. I hope this helps you continue with your work. Can you test and report please?
Btw, this is yet another issue with using the PCAP options in SSLproxy to feed decrypted traffic to an IDS, page 5 of my presentation, which I should have mentioned in the webinar.
Haven't seen the issue in the last 2 days while running the check_fd_usage_before_content_logging branch.
My load is very low. Just a single pop3 poll job every minute or so. I haven't tried your other suggestions yet. Can look into those if that is useful.
Great, if that change helps you continue your work, no need to debug any further. But it's strange that you had this issue with very low load. Btw, I ran the e2e tests with the -Y option enabled and using valgrind, it does not report any fd leak. Still strange though.
I just saw the issue again, while running the check_fd_usage_before_content_logging branch. Will look at your other suggestions when I find some time.
Another possibility for this out-of-fds issue is persistent connections, but I don't think your pop3 connections are persistent (or are they for some reason?). In any case, to prevent such resource exhaustion, SSLproxy terminates persistent connections if they are idle for 120 seconds by default. But if they break their silence even once during that 120 seconds period, they do not expire but remain in the connection list, hence continue consuming fds, which may lead to the out-of-fds issue after a few days.
Yet another possibility I have seen with new users is related with how they divert packets to sslproxy, which causes infinite redirection loops. I don't remember the exact details, but it usually happens almost immediately after sslproxy starts. So I don't think this is the case with you.
Have you increased the OpenFilesLimit value in SSLproxy, or is it still 1024? Do you still use the -Y option?
Also, is this the same issue where sslproxy exits, or does sslproxy reports that it's out of fds, but continues? The later is the intended outcome of the change in the check_fd_usage_before_content_logging branch: It is supposed to mitigate the out-of-fds issue, not prevent it.
Yes, please do check the verbose logs with the -D4 option. You are welcome to send them to me, if possible (I'd like to solve this issue, if it's due to sslproxy).
It's effectively the same output as above, with exactly the same config/commandline. Just running the check_fd_usage_before_content_logging branch. Will try the other suggestions next.
Saw the same thing again w/o the -Y pcaps option. Trying w/o -S next.
I haven't seen this anymore after dropping the -S option.
I think I have found the root cause of the fd leak. And I think it's not the logging subsystem (unless there is another fd leak there).
If you run the following command (but make sure you use the pid of the privsep client running as the regular user victor):
sudo lsof -p <PID of sslproxy privsep client>
You will see at least a couple of stale connection like this:
sslproxy 771707 nobody 326u IPv4 22742197 0t0 TCP localhost:51082->localhost:http-alt (ESTABLISHED)
These are orphaned connections from sslproxy to the listening program (lp), and the source of the fd leak.
Now, if you terminate the lp program, you should see that sslproxy crashes too, with signal 11 or an assertion failure.
Can you please confirm the above for me?
Indeed. There are 157 such entries currently.
Also correct on the crash. If I close lp, sslproxy crashes. Since I have asan enabled it shows the same asan error I shared privately before (sadly my attempt of getting symbols failed).
@victorjulien, can you test the last change I have just pushed please?
I am positive that it fixes the fd leak, because I was able to reliably reproduce the issue here as well.
Just pulled in your latest branch and did a quick test with killing lp. I can confirm that it doesn't crash sslproxy. I will let this run for a while now and let you know how it goes.
I closed my last client ~8 hours ago, but still seeing
Every 60.0s: lsof -p 24598|grep localhost c2758: Thu Oct 30 07:10:00 2025
sslproxy 24598 nobody 48u IPv4 988306 0t0 TCP localhost:45382->localhost:http-alt (CLOSE_WAIT)
sslproxy 24598 nobody 49u IPv4 989229 0t0 TCP localhost:39324->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 50u IPv4 987520 0t0 TCP localhost:35170->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 51u IPv4 986965 0t0 TCP localhost:39356->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 52u IPv4 990253 0t0 TCP localhost:57308->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 53u IPv4 991335 0t0 TCP localhost:42868->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 54u IPv4 992385 0t0 TCP localhost:58932->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 55u IPv4 990765 0t0 TCP localhost:40636->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 56u IPv4 993745 0t0 TCP localhost:48242->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 57u IPv4 990984 0t0 TCP localhost:48258->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 58u IPv4 991228 0t0 TCP localhost:46272->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 59u IPv4 993107 0t0 TCP localhost:38912->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 60u IPv4 997435 0t0 TCP localhost:38448->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 61u IPv4 994954 0t0 TCP localhost:34914->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 62u IPv4 997900 0t0 TCP localhost:42788->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 63u IPv4 999749 0t0 TCP localhost:55892->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 64u IPv4 999981 0t0 TCP localhost:44068->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 65u IPv4 1000103 0t0 TCP localhost:47062->localhost:http-alt (ESTABLISHED)
sslproxy 24598 nobody 66u IPv4 1002031 0t0 TCP localhost:54732->localhost:http-alt (ESTABLISHED)
If I understand correctly these are connections between sslproxy and lp. I had expected to see 0 here within minutes after closing my email client.
This was not supposed to happen. So:
-
The output you have posted shows that sslproxy runs as
nobody, but your commandline in your first post does not configure for that. Aren't you supposed to havevictoras the sslproxy user, or have you switched tonobodyas sslproxy user? -
You did not say if closing lp crashes sslproxy too, and if that really doesn't happen anymore, then those orphaned connections do not belong to the current instance of sslproxy. Is it possible that you have an old instance of sslproxy running?
-
Are you sure you are using my last commit? (Sanity check, sorry.)
Note that closing lp should definitely crash sslproxy with socket-based fd leaks, because lp sends final RST/FIN packets to sslproxy, which triggers eof events for the orphaned bufferevents of those sockets, but all memory of the connections who created those bufferevents have been freed already, which causes all sorts of memory access violations.
Since I am pretty sure my last commit fixes that, I would say there is perhaps another similar issue. But I am confused due to my second question above.
Apologies, I realized now that I did pull your code but forgot to build it ;-(
Quick update: SSLproxy+lp+my suricata branch have been running great in the last week with check_fd_usage_before_content_logging (SSLproxy v0.9.8-18-gc43ce62 (built 2025-10-30)). No crashes or other issues.
Quick update: SSLproxy+lp+my suricata branch have been running great in the last week with
check_fd_usage_before_content_logging(SSLproxy v0.9.8-18-gc43ce62 (built 2025-10-30)). No crashes or other issues.
Excuse me, is it convenient for you to share more information about the solution of running Suricata and sslproxy together?
I know the question is directed to @victorjulien, but here are a couple of links to relevant info, for anyone interested:
- The pull request @victorjulien is working on
- The Pre-Suricon 2025 Webinar on SSLproxy and Snort/Suricata integration
- The slides in the webinar
Note that Suricon 2025 is ongoing at the moment.