cups icon indicating copy to clipboard operation
cups copied to clipboard

Thread race condition when writing into log file

Open zdohnal opened this issue 1 month ago • 1 comments

Describe the bug We use 2 threads if needed in cupsd, and in rare case and if logging is set into a file, the output gets combined, which might corrupt the file for other tools, like grep.

We can sometimes see it in current stable release 2.4.16 (in upstream project CI), but atm I have logs from 2.4.10 from Centos Stream 10 - the log file corruption causes failure of CUPS test suite at point where we look for expected number of warnings. If the file is corrupted (by NULL byte - I see it Vim as ^@), grep does not scan the whole file and prints out only matches before the NULL byte, which causes mismatch in number of warnings.

I see it in the logs (here is the log which contains the NULL byte):

 9039 D [05/Dec/2025:11:50:50.263433 +0000] Test3: Generating PPD file from "ipp://localhost:8631/printers/Test2"...
 9040 D [05/Dec/2025:11:50:50.263494 +0000] [Notifier] event-notification-attributes-tag: 
 9041 D [05/Dec/2025:11:50:50.263501 +0000] [Notifier] 
 9042 D [05/Dec/2025:11:50:50.263506 +0000] [Notifier] notify-charset (charset) utf-8
 9043 D [05/Dec/2025:11:50:50.263510 +0000] [Notifier] notify-natural-language (naturalLanguage) en-us
 9044 D [05/Dec/2025:11:50:50.263515 +0000] [Notifier] notify-subscription-id (integer) 4
 9045 D [05/Dec/2025:11:50:50.263519 +0000] [Notifier] notify-sequence-number (integer) 14
 9046 D [05/Dec/2025:11:50:50.263524 +0000] [Notifier] notify-subscribed-event (keyword) printer-state-changed
 9047 D [05/Dec/2025:11:50:50.263528 +0000] [Notifier] printer-up-time (integer) 1764935450
 9048 D [05/Dec/2025:11:50:50.263532 +0000] [Notifier] notify-text (textWithoutLanguage) Now accepting jobs.
 9049 D [05/Dec/2025:11:50:50.263536 +0000] [Notifier] notify-printer-uri (uri) ipp://491a4fd8-fbb9-4631-972a-dd0a5c29331b/printers/Test3
 9050 D [05/Dec/2025:11:50:50.263544 +0000] [Notifier] printer-name (nameWithoutLanguage) Test3
 9051 D [05/Dec/2025:11:50:50.263548 +0000] [Notifier] printer-state (enum) stopped
 9052 D [05/Dec/2025:11:50:50.263552 +0000] [Notifier] printer-state-reasons (keyword) paused
 9053 D [05/Dec/2025:11:50:50.263556 +0000] [Notifier] printer-is-accepting-jobs (boolean) true
 9054 D [05/Dec/2025:11:50:50.263560 +0000] Test3: Connected to localhost:8631, sending Get-Printer-Attributes request...^@/printers/Test3
 9055 D [05/Dec/2025:11:50:50.263568 +0000] [Notifier] 
 9056 D [05/Dec/2025:11:50:50.263577 +0000] [Notifier] notify-charset (charset) utf-8
 9057 D [05/Dec/2025:11:50:50.263581 +0000] [Notifier] notify-natural-language (naturalLanguage) en-us
 9058 D [05/Dec/2025:11:50:50.263586 +0000] [Notifier] notify-subscription-id (integer) 4
 9059 D [05/Dec/2025:11:50:50.263594 +0000] [Notifier] notify-sequence-number (integer) 15
 9060 D [05/Dec/2025:11:50:50.263598 +0000] [Notifier] notify-subscribed-event (keyword) printer-state-changed

and a case when the mixing of logs happens as well, but there is no NULL byte to corrupt the log:

D [05/Dec/2025:11:19:34.144323 +0000] Test3: Generating PPD file from "ipp://localhost:8631/printers/Test2"...
D [05/Dec/2025:11:19:34.144386 +0000] [Notifier] event-notification-attributes-tag:
D [05/Dec/2025:11:19:34.144392 +0000] [Notifier] 
D [05/Dec/2025:11:19:34.144397 +0000] [Notifier] notify-charset (charset) utf-8
D [05/Dec/2025:11:19:34.144402 +0000] [Notifier] notify-natural-language (naturalLanguage) en-us
D [05/Dec/2025:11:19:34.144406 +0000] [Notifier] notify-subscription-id (integer) 4
D [05/Dec/2025:11:19:34.144409 +0000] [Notifier] notify-sequence-number (integer) 14
D [05/Dec/2025:11:19:34.144413 +0000] [Notifier] notify-subscribed-event (keyword) printer-state-changed
D [05/Dec/2025:11:19:34.144417 +0000] [Notifier] printer-up-time (integer) 1764933574
D [05/Dec/2025:11:19:34.144421 +0000] [Notifier] notify-text (textWithoutLanguage) Now accepting jobs.
D [05/Dec/2025:11:19:34.144425 +0000] [Notifier] notify-printer-uri (uri) ipp://c4b38e94-e69d-4938-8190-dae9089c5e0c/printers/Test3
D [05/Dec/2025:11:19:34.144429 +0000] [Notifier] printer-name (nameWithoutLanguage) Test3
D [05/Dec/2025:11:19:34.144434 +0000] Test3: Connected to tate (enum) stopped
D [05/Dec/2025:11:19:34.144438 +0000] [Notifier] printer-state-reasons (keyword) paused
D [05/Dec/2025:11:19:34.144455 +0000] [Notifier] printer-state-reasons (keyword) paused
D [05/Dec/2025:11:19:34.144461 +0000] [Notifier] printer-is-accepting-jobs (boolean) true

It looks like it always happens during running the thread for creating PPD and notifier, so I guess it is threading problem.

To Reproduce Steps to reproduce the behavior:

  1. run CUPS test suite by 'make test'
  2. sometimes the test suite fails due incorrect number of expected warnings because grep is not able to read whole log file

Test suite log looks like this:

PASS: 33 error messages.
grep: /tmp/cups-cupstester/log/error_log: binary file matches
FAIL: 4 warning messages, expected 14.
W [05/Dec/2025:11:50:40.404924 +0000] No JobPrivateAccess defined in policy default - using defaults.
W [05/Dec/2025:11:50:40.404948 +0000] No JobPrivateValues defined in policy default - using defaults.
W [05/Dec/2025:11:50:40.404952 +0000] No SubscriptionPrivateAccess defined in policy default - using defaults.
W [05/Dec/2025:11:50:40.404954 +0000] No SubscriptionPrivateValues defined in policy default - using defaults.
grep: /tmp/cups-cupstester/log/error_log: binary file matches

Expected behavior Passing test suite, not mixed logs

System Information:

  • logs are from Centos Stream 10 with 2.4.10, but seen in upstream project as well sometimes

zdohnal avatar Dec 08 '25 17:12 zdohnal

Strange, since we have a mutex controlling writes to the error_log file...

michaelrsweet avatar Dec 08 '25 21:12 michaelrsweet

Hah, I knew it was not my imagination and I saw this symptom even in Github CI - so f.e. CI results from my last PR - https://github.com/OpenPrinting/cups/actions/runs/20136013588/job/57789439542?pr=1454 :

grep: /tmp/cups-runner/log/error_log: binary file matches
FAIL: 4 warning messages, expected 14.
grep: /tmp/cups-runner/log/error_log: binary file matches
W [11/Dec/2025:14:16:02.638950 +0000] No JobPrivateAccess defined in policy default - using defaults.
W [11/Dec/2025:14:16:02.638953 +0000] No JobPrivateValues defined in policy default - using defaults.
W [11/Dec/2025:14:16:02.638956 +0000] No SubscriptionPrivateAccess defined in policy default - using defaults.
W [11/Dec/2025:14:16:02.638958 +0000] No SubscriptionPrivateValues defined in policy default - using defaults.
grep: /tmp/cups-runner/log/error_log: binary file matches
PASS: 0 notice messages.
grep: /tmp/cups-runner/log/error_log: binary file matches
PASS: 127 info messages.
grep: /tmp/cups-runner/log/error_log: binary file matches
PASS: 1961 debug messages.
grep: /tmp/cups-runner/log/error_log: binary file matches
PASS: 6063 debug2 messages.
grep: /tmp/cups-runner/log/error_log: binary file matches

I'll print out the file in the CI runner (I don't know a way how to access the files from CI run...) and see if file contains a NULL character somewhere.

Like we could work it around in tests by using grep -a so the binary files would be taken as a text and the test would have passed, but it really seems there is some threading issue.

zdohnal avatar Dec 12 '25 06:12 zdohnal

Neat! You already set the Github CI to upload test artifacts - so the NULL byte is here in the file:

 9135 d [11/Dec/2025:14:16:13.224470 +0000] cupsdStartProcess: Setting file actions.
 9136 D [11/Dec/2025:14:16:13.224472 +0000] cupsdStartProcess: Setting file actions.
 9137 d [11/Dec/2025:14:16:13.224474 +0000] [Notifier] printer-is-accepting-jobs (boolean) true^@otifier/testnotify"
 9138 D [11/Dec/2025:14:16:13.224483 +0000] [Notifier] printer-is-accepting-jobs (boolean) true
 9139 d [11/Dec/2025:14:16:13.224485 +0000] cupsdDeleteTemporaryPrinters: Removing unused temporary printers

again, related with Notifier messaging.

zdohnal avatar Dec 12 '25 10:12 zdohnal