Network backends: Investigate better timeout/failure handling and error reporting
In enterprise environments, there seems to be a common theme of networking/network printer issues that often get blamed on CUPS, where the solution is sometimes the hammer of restarting CUPS or rebooting the system to resolve hung print jobs.
We should revisit the network printing backends to ensure that:
- All network communications time out after a reasonable (and configurable) amount of time,
- No print job can hang indefinitely,
- Debug messages are logged showing communication progress,
- Errors are consistently displayed,
- The print queue's error policy is honored, and
- Additional help messages are displayed as appropriate ("is the printer unplugged", etc.)
In addition, the network printing help document should be updated and made available more prominently.
For an example for what this issue here is about see that issue https://github.com/apple/cups/issues/5559
Excerpts from that issue as far as I see what belongs to this issue here:
Symptom and root cause of that issue belong to the above
- All network communications time out after a reasonable (and configurable) amount of time,
- No print job can hang indefinitely
Matching excerpts from that issue:
issue with a printer device that gets its print jobs via IPP
(the CUPS ipp backend submits jobs to the printer)
where sometimes a print job "hangs up"
...
The printer never sent back an IPP response
to the IPP request in frame ...
The printer must send an IPP response,
even if we submit wrong/invalid data,
so the root cause is inside the printer.
...
It seems CUPS waits endlessly for an IPP response
Because the CUPS debug log messages don't indicate when and what things are not going on as usually expected, the cause of the issue could not be seen by the end-user admin so he had to submit a "CUPS doesn't work" bug report to get help. Such issues are hard to debug without the particular printer device so one cannot reproduce the issue locally and laborious dissecting of a TCP dump of such a hanging job submission had to be done. This belongs to the above point
- Debug messages are logged showing communication progress
Matching excerpts from that issue:
This is how the CUPS log file looks up to the point where it hangs
D ... Sending file using HTTP/1.1 chunking...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 12100 bytes...
For comparison how the CUPS log file looks when things work...
D ... Sending file using HTTP/1.1 chunking...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 16384 bytes...
D ... Read 12100 bytes...
D ... Print-Job: successful-ok-ignored-or-substituted-attributes ...
D ... Print job accepted - job ID NNN.
There have been no end-user visible (error)-messages when and what things are not going on as usually expected. This belongs to the above point
- Errors are consistently displayed
Matching excerpts from that issue:
... all what would be missing is to make ... waiting ... more verbose
so that the user is continuously informed ... what is going on e.g.
123th time waiting 300 seconds for ...
The user can then decide what to do with his stuck print job.
I would like to add a separated point in between the existing
- Debug messages are logged showing communication progress
- Errors are consistently displayed
I would like to add
- User notifications when and what things are not going on as usually expected
The difference is that DEBUG: messages can show communication progress
but when things are going on as expected nobody is interested in such messages
i.e. "no news is good news".
In contrast when things are not going on as expected DEBUG: messages
only help an admin who has set LogLevel debug but users won't see them.
When things are not going on as expected user notifications are additionally needed
e.g. via things like INFO: or NOTICE: or STATE:
but not ERROR: (so it does not belong to "Errors are consistently displayed")
because an error is a final faulty state
while in contrast a longer waiting delay could be a perfectly valid state
e.g. a "printer does not respond" state because a user switched it offine
to do some printer maintenance (like replacing toner or whatever).
Something to mention here, though not sure where the "fault" lies, I have an HP CP1510 series color laser printer, connected directly on the local network, running in toner "Override" mode. In this state, the printer always reports "toner empty", but of course, it prints just fine. The problem is, running Arch Linux, when printing from applications like web browsers and document viewers, the CUPS print job will hang indefinitely, with "pending", instead of sending the print job to the printer. Saving the document and then printing with "lp" seems to be an effective work-around. Still, the print job should not hang because of a meaningless printer message.
We haven't heard much on this since the latest round of fixes a few years back. I'm going to close this out but we can always refer back to it as needed and reopen as necessary...