libsir
libsir copied to clipboard
Log file write errors
If we get an error while writing to a log file, some action must be taken; right now, nothing happens except a _sir_selflog
.
Here are some options (of course all of these would hinge on the error code):
- Outright removing the file from the file cache
- Closing and re-opening it, then remove if another error occurs
- Allowing n write errors, then remove
- Giving the file a 'time-out': basically sending it to its room for a while to see if the errors are resolved later on. An example scenario is the filesystem is full, so we can't write. However, some other program or the user frees up space and we can write again. It would be a shame to lose that log data by prematurely removing the file from the cache.
There is no good reason to keep around a file that we can't write to–it'll harm performance, and the thought of it just generally annoys me.
@johnsonjh what's your take on this?
@johnsonjh what's your take on this?
Gotta think about it!
I can also imagine scenarios like stalled network filesystems (which can present oddly depending on the NFS or SMB version). The file might not be writable until closed and reopened, etc. I also wonder how very slow I/O might affect things.
@johnsonjh Did you think about this? I'm thinking close the handle, but leave the file in the cache. Wait about 5 seconds, re-open and try writing again (if open fails, remove it), then do the same thing as the anti-spam mechanism: have an exponential backoff, and then a ceiling–if it errors like 100 times, then remove it and forget all about it.
Or... just leave it alone and let the writes fail, and maybe eventually it'll work itself out, or maybe it won't.
Hrrm. Still thinking :)