libsir Log file write errors

If we get an error while writing to a log file, some action must be taken; right now, nothing happens except a _sir_selflog.

Here are some options (of course all of these would hinge on the error code):

Outright removing the file from the file cache
Closing and re-opening it, then remove if another error occurs
Allowing n write errors, then remove
Giving the file a 'time-out': basically sending it to its room for a while to see if the errors are resolved later on. An example scenario is the filesystem is full, so we can't write. However, some other program or the user frees up space and we can write again. It would be a shame to lose that log data by prematurely removing the file from the cache.

There is no good reason to keep around a file that we can't write to–it'll harm performance, and the thought of it just generally annoys me.

Jul 09 '23 09:07 aremmell

@johnsonjh what's your take on this?

Jul 11 '23 03:07 aremmell

@johnsonjh what's your take on this?

Gotta think about it!

I can also imagine scenarios like stalled network filesystems (which can present oddly depending on the NFS or SMB version). The file might not be writable until closed and reopened, etc. I also wonder how very slow I/O might affect things.

Jul 11 '23 04:07 johnsonjh

@johnsonjh Did you think about this? I'm thinking close the handle, but leave the file in the cache. Wait about 5 seconds, re-open and try writing again (if open fails, remove it), then do the same thing as the anti-spam mechanism: have an exponential backoff, and then a ceiling–if it errors like 100 times, then remove it and forget all about it.

Or... just leave it alone and let the writes fail, and maybe eventually it'll work itself out, or maybe it won't.

Jul 13 '23 06:07 aremmell

Hrrm. Still thinking :)

Jul 13 '23 06:07 johnsonjh

libsir libsir copied to clipboard

Log file write errors

libsir
libsir copied to clipboard