Log file timestamp not updated on some file systems. Support periodic sync to flush metadata to FS
Hello we are experiencing issue on Azure file shares where log files metadata (particularly last updated timestamp) is not being updated till log file is closed. We confirmed that it is not log4j issue but a problem with any FileOutputStream on this file system.
Would it be possible to add functionality to file manager implementations whereby FileOutputStream's file descriptor would be fsynched with a configured Appender's frequency
FileOutputStream.getFD().sync();
Or perhaps a more generalized solution
@roytmana, I am reluctant to push a fix for an OpenJDK bug that will probably be addressed by either the OpenJDK or Azure in the future. Would it be possible for you to repeat the following in a background thread?
- Programmatically obtain the appender
- Get your hands on its
FileOutputStream(via reflection?) - Run
FileOutputStream.getFD().sync()on it
Hello @vy Sorry for the delay I had that hack working before I submitted the ticket but to be absolutely sure I do not mislead you I wanted to get Ops deploy my test to Azure to in exactly the way our prod apps are
I just got a confirmation from them that it did solve the issue.
I did not do it in a background thread I instead copied rolling file appender and overwritten append() method where I used reflection to get the output stream on every write
So if it is feasible to add something like fileSystemSyncFrequencySec=60 to file based appenders (or at least the RollingFileAppender) to indicate that if present FileOutputStream.getFD().sync() should be called it would be great
We have a support ticket opened with microsoft on the issue but it does not seem to be getting anywhere. I can see if I can get the link to post here but not sure if it is publicly available
Thanks again for your prompt response! It would be hard to add this synch from outside of the log4j project source code - all appenders are final and builders are statically associated with them - lots of copying of your code just to get past protected members and final classes :-)
@vy from my perspective it would probably be sufficient to do it inside of append() method checking on elapsed time since previous sync The disadvantage of course is that if a log does not get much traffic it will not be synched for some time and dates not flushed. Thread based sync would be much nicer as it would work as a heartbeat regardless of particular appender activity. You would need to have a list of all appenders configured for this heartbeat and do it for all of them though. In both cases you may need to synchronize normal appender write ops and call to sync - I do not know if calling them concurrently may pose any issues
@vy
Potentially file output stream could be opened using NIO with java.nio.file.StandardOpenOption.SYNC option. It probably (need to test) will do the trick but I am wondering if there will be any noticeable performance penalty since it'll happen on every write
I can test it as well (not with log4j but with a plain text file)
/**
* Requires that every update to the file's content or metadata be written
* synchronously to the underlying storage device.
*
* @see <a href="package-summary.html#integrity">Synchronized I/O file integrity</a>
*/
SYNC,
/**
* Requires that every update to the file's content be written
* synchronously to the underlying storage device.
*
* @see <a href="package-summary.html#integrity">Synchronized I/O file integrity</a>
*/
DSYNC;
@roytmana, double checking: Does immediateFlush not help?
Potentially file output stream could be opened using NIO with
java.nio.file.StandardOpenOption.SYNCoption. It probably (need to test) will do the trick but I am wondering if there will be any noticeable performance penalty since it'll happen on every write
I disagree, if the performance penalty was be negligible, that option would be on by default.
Microsoft will probably solve the problem sooner or later, so I see no reason to modify the behavior for all Log4j Core users, however we can:
- implement #2117 or at least part of it. The usage of NIO has the advantage of using the same
StandardOpenOptions on UNIX and Windows. - add a configuration option to all file appenders and rolling file appenders that accepts a comma-separated list of
StandardOpenOptions.
Does immediateFlush not help?
That would be really strange: immediateFlush should only flush the Java buffers (mostly our buffers, but some implementations of OutputStream also have buffers), while the problem here seems to be related to OS buffers that are not synced with the underlying device.
I just confirmed that immediateFlush does not work. I also tried just in case random rolling file appender did not help either
i tried SYNC and DSYNC on a text file (and it will require bunch of other options to be supplied like CREATE, APPEND etc) it does not appear to fix the issue but I may need to test it a bit more with new file vs appended existing etc if we want to give that a try
@rm5248, @FreeAndNil,
Do you have similar issues in Log4cxx and Log4net on Azure?
we are a java shop :-)
we are a java shop :-)
Trying to figure out if this is Java specific.
we are java and we run on linux (redhat 8) containers in azure Kubernetes. Unfortunately I have no way to test. if you could give me an executable I can see if we can run it from windows machine to azure file share but it'll be apple to oranges likely as is the OS/networking stack or maybe SMB file shares that are at fault
I appreciate very much your sticking with it because it is really not a log4j issue I was just hoping that an enhancement for periodic fsync would be helpful enough in general and non-intrusive enough to make it to the codebase :-)
Do you have similar issues in Log4cxx and Log4net on Azure?
I feel like I might have seen something similar a long time ago, but I can't recall what it was. I definitely haven't heard of anything like this happening recently though. It sounds like an OS or JVM issue though, as metadata such as the last updated timestamp should be handled by the filesystem.
Interestingly it seems that NTFS may not update the last access timestamp immediately:
The NTFS file system delays updates to the last access time for a file by up to 1 hour after the last access.
That's probably not relevant here, unless the filesystem where logs are being written to is somehow NTFS.
@rm5248 , @ppkarwasz on NTFS I often see that the file size is not updated for log files. It looks like the log file is empty, but when I look into it, there is content that is older than the point I checked the file size. When I look again at the file size, it is now "correct". I assume this is an optimization by Microsoft.
Our few azure systems use azure analytics instead of log4net.
In our case timestamp is not updated at all and after 60 days when archival process (external to the app) looks for old logs it hits on live open log files and delete them completely breaking logging - log4j would not recover and will continue to "write" to deleted file writing nothing and not re-creating the file
I do not know what actual file system is behind those azure file shares but they are SMB shares mounted in Red Hat ubi8 based PODS on Azure AKS