cups
cups copied to clipboard
CUPS sometimes hangs for 10 seconds when a client connects to it via TLS
After updating from Fedora 40 to Fedora 41 I noticed the Gnome printer selection window would often hang (the "force quit or wait" dialog is shown) when selecting a printer. This is obviously not a CUPS issue but a Gnome bug since apparently they didn't take in to account querying a printer is a blocking operation and should be handled off the main UI-thread. Other UI toolkits are also affected by this. For example the print dialog in LibreOffice also hangs sometimes.
However this made me investigate why this 10 second hang was sometimes happening when selecting a printer in the print dialog window. It turns out the CUPS server itself (which runs remotely, there is no local CUPS service) sometimes seems to hang for 10 seconds when a client tries to connect with it over TLS. The CUPS server runs on FreeBSD 14.2 and is CUPS 2.4.11. I actually updated to these latest versions. Before that it was running FreeBSD 13.4 with CUPS 2.4.10 and I was experiencing the same issue with Fedora 41 clients.
This is what /etc/cups/client.conf on a Fedora 41 client looks like:
ServerName print.nl2.serviceplanet.nl
AllowAnyRoot No
AllowExpiredCerts No
TrustOnFirstUse No
UserAgentTokens ProductOnly
ValidateCerts Yes
# 'Always' means the CUPS client will immediately do HTTPS instead of trying to upgrade an HTTP
# connection (i.e. STARTTLS). This is required when using "SSLListen :631" on the server (which we do).
Encryption Always
SSLOptions MinTLS1.2
This is what running truss on the cupsd process on the server side looks like when the process hangs:
access("/usr/local/etc/cups/ssl/office-support-1.crt",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.key",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.crt",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.key",R_OK) = 0 (0x0)
open("/usr/local/etc/cups/ssl/office-support-1.key",O_RDONLY|O_CLOEXEC,0666) = 25 (0x19)
fstat(25,{ mode=-rw-r----- ,inode=1570,size=1704,blksize=32768 }) = 0 (0x0)
lseek(25,0x0,SEEK_CUR) = 0 (0x0)
read(25,"-----BEGIN PRIVATE KEY-----\nMII"...,1705) = 1704 (0x6a8)
read(25,0xc37766ad4a8,1) = 0 (0x0)
close(25) = 0 (0x0)
open("/usr/local/etc/cups/ssl/office-support-1.crt",O_RDONLY|O_CLOEXEC,0666) = 25 (0x19)
fstat(25,{ mode=-rw-r--r-- ,inode=1571,size=4174,blksize=32768 }) = 0 (0x0)
lseek(25,0x0,SEEK_CUR) = 0 (0x0)
fstat(25,{ mode=-rw-r--r-- ,inode=1571,size=4174,blksize=32768 }) = 0 (0x0)
read(25,"subject=C = NL, ST = Zuid-Hollan"...,32768) = 4174 (0x104e)
read(25,0xc377b038f80,32768) = 0 (0x0)
close(25) = 0 (0x0)
madvise(0xc3779a97000,4096,MADV_FREE) = 0 (0x0)
getpid() = 41054 (0xa05e)
setsockopt(23,SOL_SOCKET,SO_RCVTIMEO,0xc376c9bffc0,16) = 0 (0x0)
setsockopt(23,SOL_SOCKET,SO_SNDTIMEO,0xc376c9bffc0,16) = 0 (0x0)
----------------------------8<--------------------------------------------------------------------------------------------------------
CUPS freezes for 10 seconds
----------------------------8<--------------------------------------------------------------------------------------------------------
poll({ 23/POLLIN },1,10000) = 0 (0x0)
__sysctl("kern.hostname",2,0xc376c9c0cc0,0xc376c9bcca0,0x0,0) = 0 (0x0)
getpid() = 41054 (0xa05e)
sendto(4,"<51>1 2024-12-27T14:26:55.617251"...,164,0,NULL,0) = 164 (0xa4)
kevent(3,{ 23,EVFILT_READ,EV_DELETE,0,0,0xc37767481a0 },1,0x0,0,{ 0.000000000 }) = 0 (0x0)
close(23) = 0 (0x0)
kevent(3,0x0,0,{ 5,EVFILT_READ,0x0,0,0x1,0xc3776748d40 12,EVFILT_READ,0x0,0,0x1f2,0xc37766beb20 13,EVFILT_READ,0x0,0,0x1f3,0xc3776748600 14,EVFILT_READ,0x0,0,0x1f7,0xc37766bea80 },32768,{ 1.000000000 }) = 4 (0x4)
accept(5,{ AF_INET 172.16.1.58:51284 },0xc376c9c1444) = 23 (0x17)
setsockopt(23,SOL_SOCKET,SO_NOSIGPIPE,0xc376c9c1440,4) = 0 (0x0)
setsockopt(23,IPPROTO_TCP,TCP_NODELAY,0xc376c9c1440,4) = 0 (0x0)
fcntl(23,F_SETFD,FD_CLOEXEC) = 0 (0x0)
getsockname(23,{ AF_INET 10.201.11.11:631 },0xc376c9c159c) = 0 (0x0)
getsockname(23,{ AF_INET 10.201.11.11:631 },0xc376c9c159c) = 0 (0x0)
kevent(3,{ 23,EVFILT_READ,EV_ADD,0,0,0xc37767481a0 },1,0x0,0,{ 0.000000000 }) = 0 (0x0)
getsockname(23,{ AF_INET 10.201.11.11:631 },0xc376c9c0030) = 0 (0x0)
fstatat(AT_FDCWD,"/etc/resolv.conf",{ mode=-rw-r--r-- ,inode=110,size=51,blksize=32768 },0x0) = 0 (0x0)
open("/etc/hosts",O_RDONLY|O_CLOEXEC,0666) = 25 (0x19)
fstat(25,{ mode=-rw-r--r-- ,inode=156,size=1188,blksize=32768 }) = 0 (0x0)
read(25,"# $FreeBSD: src/etc/hosts,v 1.16"...,32768) = 1188 (0x4a4)
close(25) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.crt",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.key",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.crt",R_OK) = 0 (0x0)
access("/usr/local/etc/cups/ssl/office-support-1.key",R_OK) = 0 (0x0)
open("/usr/local/etc/cups/ssl/office-support-1.key",O_RDONLY|O_CLOEXEC,0666) = 25 (0x19)
fstat(25,{ mode=-rw-r----- ,inode=1570,size=1704,blksize=32768 }) = 0 (0x0)
lseek(25,0x0,SEEK_CUR) = 0 (0x0)
read(25,"-----BEGIN PRIVATE KEY-----\nMII"...,1705) = 1704 (0x6a8)
read(25,0xc37766ad4a8,1) = 0 (0x0)
close(25) = 0 (0x0)
open("/usr/local/etc/cups/ssl/office-support-1.crt",O_RDONLY|O_CLOEXEC,0666) = 25 (0x19)
fstat(25,{ mode=-rw-r--r-- ,inode=1571,size=4174,blksize=32768 }) = 0 (0x0)
lseek(25,0x0,SEEK_CUR) = 0 (0x0)
fstat(25,{ mode=-rw-r--r-- ,inode=1571,size=4174,blksize=32768 }) = 0 (0x0)
read(25,"subject=C = NL, ST = Zuid-Hollan"...,32768) = 4174 (0x104e)
read(25,0xc377b038f80,32768) = 0 (0x0)
The freeze of the cupsd process is also reflected in the debug logs:
D [27/Dec/2024:14:40:57 +0100] [Client 207] Closing connection.
D [27/Dec/2024:14:40:57 +0100] cupsdSetBusyState: newbusy="Not busy", busy="Not busy"
D [27/Dec/2024:14:40:57 +0100] cupsdSetBusyState: newbusy="Active clients", busy="Not busy"
D [27/Dec/2024:14:40:57 +0100] [Client 208] Server address is "10.201.11.11".
D [27/Dec/2024:14:40:57 +0100] [Client 208] Accepted from 172.16.1.58:45648 (IPv4)
D [27/Dec/2024:14:40:57 +0100] [Client 208] Waiting for request.
----------------------------8<--------------------------------------------------------------------------------------------------------
CUPS freezes for 10 seconds
----------------------------8<--------------------------------------------------------------------------------------------------------
E [27/Dec/2024:14:41:07 +0100] [Client 208] Unable to encrypt connection: Error in the pull function.
D [27/Dec/2024:14:41:07 +0100] [Client 208] Closing connection.
This is what Wireshark looks like on the client where the 10 second wait can be seen after initiating TLS:
At first I thought this seemed like a DNS issue. But from what I can tell all the DNS resolving seems to be fine. I'm not really sure what is happening here. By default FreeBSD uses GnuTLS with CUPS. I think it should also be possible to build CUPS 1.4 with OpenSSL nowadays?
This is what cupsd.conf on the server looks like (which is not terribly exiting):
#
# Configuration file for the CUPS scheduler. See "man cupsd.conf" for a
# complete description of this file.
#
# Log general information in error_log - change "warn" to "debug"
# for troubleshooting...
LogLevel warn
PageLogFormat
# Specifies the maximum size of the log files before they are rotated. The value "0" disables log rotation.
MaxLogSize 1m
# Retry the job after waiting for N seconds; the JobRetryInterval directive controls the value of N.
ErrorPolicy retry-job
JobRetryInterval 30
SSLListen 0.0.0.0:631
SSLOptions MinTLS1.2
Listen /var/run/cups/cups.sock
ServerName print.nl2.serviceplanet.nl
ServerTokens ProductOnly
ServerAdmin <REDACTED>
# Show shared printers on the local network.
Browsing On
# Beware that setting this to 'none' causes the web ui to list printers as "not shared". What that means is that
# they aren't shared via protocols such as Bonjour. They are still "shared" via IPP however.
BrowseLocalProtocols none
# Do not lookup DNS names of connecting clients.
HostNameLookups Off
# We don't want to use persistent storage because ideally even with all drives failed the print
# server should still work. Job history might also contain sensitive document names.
PreserveJobFiles No
AutoPurgeJobs Yes
# Default authentication type, when authentication is required...
DefaultAuthType Basic
# Web interface setting...
WebInterface Yes
# Restrict access to the server...
<Location />
Order allow,deny
Allow all
</Location>
# Restrict access to the admin pages...
<Location /admin>
Order allow,deny
Allow all
</Location>
# Restrict access to configuration files...
<Location /admin/conf>
AuthType Default
Require user @SYSTEM
Order allow,deny
Allow all
</Location>
# Restrict access to log files...
<Location /admin/log>
AuthType Default
Require user @SYSTEM
Order allow,deny
Allow all
</Location>
# Set the default printer/job policies...
<Policy default>
# Job/subscription privacy...
JobPrivateAccess default
JobPrivateValues default
SubscriptionPrivateAccess default
SubscriptionPrivateValues default
# Job-related operations must be done by the owner or an administrator...
<Limit Create-Job Print-Job Print-URI Validate-Job>
Order deny,allow
</Limit>
<Limit Send-Document Send-URI Hold-Job Release-Job Restart-Job Purge-Jobs Set-Job-Attributes Create-Job-Subscription Renew-Subscription Cancel-Subscription Get-Notifications Reprocess-Job Cancel-Current-Job Suspend-Current-Job Resume-Job Cancel-My-Jobs Close-Job CUPS-Move-Job CUPS-Get-Document>
Require user @OWNER @SYSTEM
Order deny,allow
</Limit>
# All administration operations require an administrator to authenticate...
<Limit CUPS-Add-Modify-Printer CUPS-Delete-Printer CUPS-Add-Modify-Class CUPS-Delete-Class CUPS-Set-Default CUPS-Get-Devices>
AuthType Default
Require user @SYSTEM
Order deny,allow
</Limit>
# All printer operations require a printer operator to authenticate...
<Limit Pause-Printer Resume-Printer Enable-Printer Disable-Printer Pause-Printer-After-Current-Job Hold-New-Jobs Release-Held-New-Jobs Deactivate-Printer Activate-Printer Restart-Printer Shutdown-Printer Startup-Printer Promote-Job Schedule-Job-After Cancel-Jobs CUPS-Accept-Jobs CUPS-Reject-Jobs>
AuthType Default
Require user @SYSTEM
Order deny,allow
</Limit>
# Only the owner or an administrator can cancel or authenticate a job...
<Limit Cancel-Job CUPS-Authenticate-Job>
Require user @OWNER @SYSTEM
Order deny,allow
</Limit>
<Limit All>
Order deny,allow
</Limit>
</Policy>
Is this right after the server boots up or has it been running for a while?
Both GNU TLS and OpenSSL can block waiting for system entropy which is used to generate random numbers. This normally only happens after a system first boots up... The other thing is that the 10 seconds is also the timeout for intermediate data (how long cupsd will wait for the remainder of a packet or message fragment) but it certainly looks like the client has sent the complete TLS client hello message so that shouldn't be an issue here.
This is happening both when the server has been running for a while and when the server has just booted (which was the case when I tested it to see if the problem went away after updating to FreeBSD 14.2).
Both GNU TLS and OpenSSL can block waiting for system entropy which is used to generate random numbers.
That sounds like a good culprit. However I would expect to see to see a system call in the truss output of cupsd if something (like GnuTLS) was trying to get such (true) randomness from the system. I don't think that would be an 100% userland operation? There doesn't seem to be such a system call:
setsockopt(23,SOL_SOCKET,SO_SNDTIMEO,0xc376c9bffc0,16) = 0 (0x0)
----------------------------8<--------------------------------------------------------------------------------------------------------
CUPS freezes for 10 seconds
----------------------------8<--------------------------------------------------------------------------------------------------------
poll({ 23/POLLIN },1,10000) = 0 (0x0)
If I recall correctly this problem didn't exist with Fedora 40 as a client. Tomorrow I will make a Fedora 40 VM and see if I recalled this correctly and try to determine if I can see a difference between Fedora 40 and Fedora 41. Maybe that gives some more insights.
I did a test with both a Fedora 40 and Fedora 41 VM and it seems the problem is present in both from what I can see. So that doesn't really add up with the experience I recall but it seems I'm remembering stuff wrong (or didn't notice in Fedora 40?). Either way that undermines my hypothesis so I'll do some digging to rule out lack of randomness like you suggested.
Hi @siepkes,
lately I made a change in Fedora 40+ for CUPS to follow system crypto policies by default (so any client using libcups would follow the system settings, in your case IPP backend which communicates with your server), but I did not see such behavior when I was testing it. Last week I have prepared "NoSystem" option (F41 already has it, F40 is in testing https://bodhi.fedoraproject.org/updates/FEDORA-2025-3b7602f05c ) for SSLOptions directive, which gets you the previous behavior. So you can add it into /etc/cups/client.conf and see if it makes difference.
Either way I hope you mean there is "2.4.11" on FreeBSD, not "1.4.*", because it is way too old and really can cause issues (I did not test it against such old CUPS versions to be frank). And I'm not sure what crypto library FreeBSD uses for building CUPS either...
Just to be sure, I've created a testing env with Fedora 42 client-server print system and I don't see any error message in cupsd logs or delay in cupsd by strace:
2186<cupsd> 08:06:20.184508 setsockopt(14<TCP:[192.168.124.52:631->192.168.124.164:36000]>, SOL_SOCKET, SO_RCVTIMEO_OLD, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000051>
2186<cupsd> 08:06:20.184634 setsockopt(14<TCP:[192.168.124.52:631->192.168.124.164:36000]>, SOL_SOCKET, SO_SNDTIMEO_OLD, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000030>
2186<cupsd> 08:06:20.184730 poll([{fd=14<TCP:[192.168.124.52:631->192.168.124.164:36000]>, events=POLLIN}], 1, 10000) = 1 ([{fd=14, revents=POLLIN}]) <0.000023>
2186<cupsd> 08:06:20.184816 recvfrom(14<TCP:[192.168.124.52:631->192.168.124.164:36000]>, "\26\3\1\1\203", 5, 0, NULL, NULL) = 5 <0.000021>
2186<cupsd> 08:06:20.184905 poll([{fd=14<TCP:[192.168.124.52:631->192.168.124.164:36000]>, events=POLLIN}], 1, 10000) = 1 ([{fd=14, revents=POLLIN}]) <0.000033>
I would suspect the printing would stop as whole if certs were regenerated due system policy (I saw one bug regarding that - updating certs fixed it), but not a delay.
Can you get someone from FreeBSD to investigate the issue from their part?
In the meantime you can try removing certs (if they were generated by cups) or updating them to use better algorithms (IIRC signing alg was the problem - previously certs were signed by RSA, after applying policy it is EDCSA). Additionally you can try "SSLOptions MinTLS1.2 NoSystem" if it is already available for you.
@zdohnal Thanks for reaching out!
Either way I hope you mean there is "2.4.11" on FreeBSD, not "1.4.*",
Yes, I meant 2.4.11, sorry for the confusion. I'll update the original post.
Last week I have prepared "NoSystem" option (F41 already has it
I'll give that a shot on Fedora 41 to see if it makes a difference.
Can you get someone from FreeBSD to investigate the issue from their part?
I'll try and see if I can reproduce this fully on the FreeBSD side (i.e. with Fedora out of the equation). If that's the case it should make troubleshooting easier with the FreeBSD people.
In the meantime you can try removing certs (if they were generated by cups)
The certificates are RSA and were generated by Hashicorp Vault (and soon with OpenBao, since with moved to the fork). I'll see if I can do a test with an ECDSA cert and see if it makes a difference.
@siepkes IMO you won't be able to reproduce it easily on FreeBSD - you would have to enforce priority string from Fedora GNUTLS into FreeBSD cups client (you can see the string if you debug the client by GDB or if you enable GNUTLS library debug) if FreeBSD actually uses GNUTLS for cups. If FreeBSD uses openssl in CUPS, I'm no help here - I don't know much of openssl. But the error message you showed I saw in CUPS in the past with Gnutls. So unless the error message is the same in both crypto libs, then hopefully FreeBSD uses Gnutls.
EDIT: Mea culpa - now I see text in your comment - " By default FreeBSD uses GnuTLS with CUPS.". So you can change the priority string in FreeBSD client.
Quick in between update:
Additionally you can try "SSLOptions MinTLS1.2 NoSystem" if it is already available for you.
I did a quick test with a fully updated Fedora 41 and the NoSystem did not make a difference. For completeness-sake I also removed MinTLS1.2 to see if that did anything but that didn't help.
I'll focus on getting a standalone reproducer in FreeBSD.
I also see this problem. In our customer case there is a huge number of clients and these TLS errors may correlate with cups server becoming unresponsive indefinitely. It is not clear the root cause is the same as this bug, but here is a potential issue:
We see:
E [09/Jan/2025:19:04:31 +0100] [Client 631415] Unable to encrypt connection: The TLS connection was non-properly terminated.
E [09/Jan/2025:19:21:38 +0100] [Client 631416] Unable to encrypt connection: Error in the pull function.
that is a delay of 17 minutes, not 10 seconds.
-
The TLS connection was non-properly terminated.meansGNUTLS_E_PREMATURE_TERMINATION.This error code in turn is set forECONNRESETin gnutlslibs/buffer.cerrno_to_gerr()function. -
Why is ECONNRESET ignored in
cups/http.c? It is never set e.g. in the unix case (http_write()incups/http.c):
else if (errno != http->error && errno != ECONNRESET)
{
http->error = errno;
continue;
}
- Knowning that the connection has been reset (i.e. http->error correctly set to ECONNRESET)
could help. For example in
scheduler/client.cwe have this:
if (httpIsEncrypted(con->http))
partial = 1;
if (partial)
{
/*
* Only do a partial close so that the encrypted client gets everything.
*/
httpShutdown(con->http);
cupsdAddSelect(httpGetFd(con->http), (cupsd_selfunc_t)cupsdReadClient,
NULL, con);
cupsdLogClient(con, CUPSD_LOG_DEBUG, "Waiting for socket close.");
}
Here , if the connection has been reset, I don't think it makes sense to wait for a socket close.
So, as a colleague pointed out, the check to do this select() should be:
if ((partial) && (con->http->error != ECONNRESET))
or perhaps more generally:
if ((partial) && !con->http->error)
However, based on the previous code snippet in cups/http.c, we never properly set http->error to ECONNRESET, so we
miss this ECONNRESET scenario.
@vliaskov WRT ECONNRESET, we normally see EPIPE on write errors due to a closed socket. Doing a little digging, the "set errno unless ECONNRESET" change was introduced in the CUPS 1.1 days (April 20, 2004 or almost 21 years ago). The log message says simply:
Fixed a segfault problem with some of the client programs (STR #668)
STR #668 was copied into the Apple CUPS repository back when we moved to Github, and it isn't clear how ignoring ECONNRESET prevents the segfault... Anyways, I will look into that as well as what might be causing the 10 second delay.
Hi everyone, just adding here what I found about the '10 second' timeout while researching our freezes (https://github.com/OpenPrinting/cups/discussions/1259).
The 10 seconds refers to the TLS Handshake setup timeout. For whatever reason, whether network error or some other issue, CUPS was blocking waiting for data from the client to complete the TLS Handshake.
The 10 second timeout is set around here: https://github.com/OpenPrinting/cups/blob/3a80247dda50f94e2e10717437186a76d1b153f1/cups/tls-gnutls.c#L1578
The actual 'wait' occurs in: https://github.com/OpenPrinting/cups/blob/3a80247dda50f94e2e10717437186a76d1b153f1/cups/http.c#L3059
You can see that the poll() call is the same as in the truss output at the end of the 10 seconds - the poll() call returns after a 10 second timeout with return value 0 = "A return value of zero indicates that the system call timed out before any file descriptors became ready."
poll({ 23/POLLIN },1,10000) = 0 (0x0)
We have some pending changes that may help here; assigning to 2.4.x for now...