caddy
caddy copied to clipboard
Logging: reopen logs after reload
Summary: when Caddy is reloaded, it should re-open its log files. If the file does not exist, create one.
Use case: I would like logrotate to rotate the log files. This is the convention with logrotate services (copied from https://github.com/caddyserver/caddy/issues/329#issuecomment-223819543). When time to rotate the configs come:
- move the log file (e.g.
mv access.log access.log.1
). - reload the process:
systemctl reload caddy
. - let the process close the log file. There is no race condition, because the process kept writing to the file with the new name until now.
- let the process re-open the log file with old (original) name. This creates a new file.
- do something with renamed log file (compress)
However, step (4) does not work: Caddy does not re-open the file.
Steps to reproduce
Caddyfile
{
http_port 8088
https_port 8084
}
http://localhost:8088 {
file_server
log {
output file access.log
}
}
- Run the server:
caddy run --config Caddyfile
- Issue a request:
curl http://localhost:8088
- Observe there is a log line in
access.log
. -
mv access.log access.log.1
- caddy reload --config Caddyfile --force`
- Issue a 2nd request:
curl http://localhost:8088
- Observe that the logs of the second request are in
access.log.1
, andaccess.log
was not created.
What did I expect instead?
At step (6) a new file access.log
was created and it has the mentioned log line.
Original issue: https://github.com/caddyserver/caddy/issues/329#issuecomment-1385299678
Reloading the config when there's no actual changes to the config is a no-op, unless you use the --force
flag.
Might that be what's tripping you up?
FYI, Caddy has its own log rolling, see https://caddyserver.com/docs/caddyfile/directives/log#file (but maybe that's what you meant you're using).
Reloading the config when there's no actual changes to the config is a no-op, unless you use the
--force
flag.Might that be what's tripping you up?
I observed this in production. Updating the description.
FYI, Caddy has its own log rolling, see https://caddyserver.com/docs/caddyfile/directives/log#file (but maybe that's what you meant you're using).
No, I am by choice using logrotate.
No, I am by choice using logrotate.
Okay, you might want to set roll_disabled
then, as per the docs.
As far as I can tell from the code, not closing/reopening file handles on reload is intentional. The reason is that reloads can be very frequent, and constantly reopening handles can cause some performance issues. So we use a UsagePool
which persists writers across config reloads as long as the same filename is used.
On reload LoadOrNew()
fetches a reference to the existing file handle, +1s the usage count.
https://github.com/caddyserver/caddy/blob/e8ad9b32c9730ddb162b6fb1443fc0b36fcef7dc/logging.go#L226
Then the old config unloads and calls Delete()
which -1s the count.
https://github.com/caddyserver/caddy/blob/e8ad9b32c9730ddb162b6fb1443fc0b36fcef7dc/logging.go#L192
Since the count doesn't reach 0 during reloads, the Delete()
procedure doesn't call Destruct()
on the writer to close the file handle.
https://github.com/caddyserver/caddy/blob/e8ad9b32c9730ddb162b6fb1443fc0b36fcef7dc/usagepool.go#L184
So I'm not sure what we should do here. Because this is obviously "working as intended". But your usecase is valid, having a way to have Caddy reload the log file makes sense when using external log rotation.
I am by choice using logrotate.
You can use the copytruncate
option with logrotate. This is maybe not very efficient on large (as Caddy logs can be) files.
Perhaps a new caddy rotate logs
option could be created, which will either force a log rotate (Caddy's internal logrotate) or re-opens log files if roll_disabled
is set.
Does the logger even return an error on Write()
if the file is removed out from under it? IIRC, no error is returned -- I may be wrong though. If I'm right, however, then I'm really not sure how we can do it, because I think a failed write is the only way we'd know that the file is missing.
Does the logger even return an error on
Write()
if the file is removed out from under it
On unix systems no error until the fd is closed.
I am not sure how would one detect this condition - perhaps stat
and check if the inodes are the same.
Anyhow, the convention is SIGHUP. If that one is taken, maybe SIGISR1 or SIGUSR2 or a custom api call?
Anyhow, the convention is SIGHUP.
Is this reliable though? And what does it tell us exactly? There's sometimes more than one log output. I guess we just reset all of them? Does the SIGHUP happen after the file is replaced out from under the feet of the writer? Or does it happen before? If after, how many log lines can get lost before we can receive and process the signal?
I have so many questions!
Stat'ing on each write is too expensive, I don't think we can afford that.
Does the SIGHUP happen after the file is replaced out from under the feet of the writer?
The normal case, AFAIK, is that when the original log file is renamed (mv), the writes continue to the new name because the fd and inode is unchanged. A new file is created in its place and when sighup is sent, the process opens the new file and continues writing there. So no lost writes should happen.
As mentioned earlier, mv-ing a file does not change its inode or fd, so Caddy will keep writing to the same inode, which in th example of OP is access.log.1
. It will not and cannot switch to the newly created empty file, i.e. access.log
, unless Caddy closes the old fd/inode and opens the new one. As @francislavoie said, it may be wasteful to close/reopen the log file on every reload, and I perceive it may be subject to abuse the system.
I am by choice using logrotate.
You can use the
copytruncate
option with logrotate. This is maybe not very efficient on large (as Caddy logs can be) files.
Without knowing much about your environment and based on the limited information given, this appears to be the best option for your use case. Will it not work for you?
The other option may be to use the github.com/fsnotify/fsnotify package to detect file changes (keeping in mind the complexity of robust handling of the events: fsnotify/fsnotify#372). However, same with closing/reopening the file, this introduces an opportunity of abuse/misuse. Caddy may be spammed with a flood of (bogus) events causing it to constantly close/open the file, potentially blocking log writes as the logger awaits a proper and usable fd.
As mentioned earlier, mv-ing a file does not change its inode or fd, so Caddy will keep writing to the same inode, which in th example of OP is
access.log.1
. It will not and cannot switch to the newly created empty file, i.e.access.log
, unless Caddy closes the old fd/inode and opens the new one. As @francislavoie said, it may be wasteful to close/reopen the log file on every reload, and I perceive it may be subject to abuse the system.
My machine can open and close a file about 200k times per second. And that's a mediocre laptop on an encrypted btrfs filesystem (i.e. overhead). Just tried this on a server, got almost 500k/s. We should be reloading the configuration hundreds of thousands of times per second for this to matter. I guess we aren't?
main.go
package main
import (
"fmt"
"os"
"time"
)
const nfiles = 1_000_000
func main() {
start := time.Now()
for i := 0; i < nfiles; i++ {
f, err := os.OpenFile("tempfile.txt", os.O_CREATE, 0644)
if err != nil {
fmt.Printf("error opening file: %v\n", err)
os.Exit(1)
}
f.Close()
}
duration := time.Now().Sub(start)
if err := os.Remove("tempfile.txt"); err != nil {
fmt.Printf("error deleting file: %v\n", err)
os.Exit(1)
}
fmt.Printf("It took %.2f seconds to create %d files.\n", duration.Seconds(), nfiles)
fmt.Printf("That's %.f per second.\n", nfiles/duration.Seconds())
}
My laptop:
$ go run main.go
It took 4.55 seconds to create 1000000 files.
That's 219605 per second.
Some server in GCP I have access to:
It took 2.07 seconds to create 1000000 files.
That's 483532 per second.
I am by choice using logrotate.
You can use the
copytruncate
option with logrotate. This is maybe not very efficient on large (as Caddy logs can be) files.
Like mentioned in the documentation of copytruncate, it does have a race condition which may cause it to lose data. For me it is more acceptable to restart caddy and lose a bit of traffic. That's what I am going with now, but can we do better?
Without knowing much about your environment and based on the limited information given, this appears to be the best option for your use case. Will it not work for you?
I have an "ecosystem" of log files and know how to deal with them. :) Here is a logrotate configuration for a web server that I wrote (which I am in working to replace by caddy):
/var/log/private/jjweb/access.log {
rotate 60
daily
dateext
dateyesterday
compress
compresscmd /usr/bin/zstd
compressext .zst
compressoptions --long -19
uncompresscmd /usr/bin/unzstd
postrotate
/usr/bin/systemctl reload jjweb
endscript
}
Caddy has a log rotation mechanism, but it's obviously not as rich as logrotate (and does not have to be). The most important for me are compresscmd
, dateext
and dateyesterday
. But everyone's environment is different.
The other option may be to use the github.com/fsnotify/fsnotify package to detect file changes (keeping in mind the complexity of robust handling of the events: fsnotify/fsnotify#372). However, same with closing/reopening the file, this introduces an opportunity of abuse/misuse. Caddy may be spammed with a flood of (bogus) events causing it to constantly close/open the file, potentially blocking log writes as the logger awaits a proper and usable fd.
I think a simple file reload mechanism, as can be seen in many unix daemons, is quite well understood and a cheap solution to this problem.
We generally try to ignore signals since they take on varied and ambiguous purposes here and there, but if it's very important we can try implementing SIGHUP to close and re-open logs. It may result in some lost writes however, unless we sync all writers to block while we re-open the file.
I'm a bit busy right now, but could prioritize this for a sufficiently-tiered sponsor; or anyone else is welcome to contribute it!
We generally try to ignore signals since they take on varied and ambiguous purposes here and there, but if it's very important we can try implementing SIGHUP to close and re-open logs. It may result in some lost writes however, unless we sync all writers to block while we re-open the file.
If one does (*os.File).Write(...)
on Linux, it will be atomic: due to two locks: one in Go, one in kernel. I was surprised when I found it (see this) and even more surprised with the resolution.
After reading a bit more about it I found that the problem with locks is contention, not having locks per se. If we want to add another lock that waits for the server to re-open the file, contention will barely change, thus throughput.
I'm a bit busy right now, but could prioritize this for a sufficiently-tiered sponsor; or anyone else is welcome to contribute it!
Thanks for your honest answer! It is not a priority for me right now, but just didn't feel right to leave it without registering an issue first. I am happy we have a consensus. If those lost requests ever become a problem for me (currently I restart caddy after log rotation), I will happily implement it myself.
SELinux configs don't really play nicely with caddy and built-in log rotation currently. I'd prefer to use logrotate anyway for consistency.
It'd be great if a solution here was implemented to allow this.