logging-log4j2 icon indicating copy to clipboard operation
logging-log4j2 copied to clipboard

Log file timestamp not updated on some file systems. Support periodic sync to flush metadata to FS

Open roytmana opened this issue 8 months ago • 14 comments

Hello we are experiencing issue on Azure file shares where log files metadata (particularly last updated timestamp) is not being updated till log file is closed. We confirmed that it is not log4j issue but a problem with any FileOutputStream on this file system.

Would it be possible to add functionality to file manager implementations whereby FileOutputStream's file descriptor would be fsynched with a configured Appender's frequency

FileOutputStream.getFD().sync();

Or perhaps a more generalized solution

roytmana avatar Apr 08 '25 16:04 roytmana

@roytmana, I am reluctant to push a fix for an OpenJDK bug that will probably be addressed by either the OpenJDK or Azure in the future. Would it be possible for you to repeat the following in a background thread?

  1. Programmatically obtain the appender
  2. Get your hands on its FileOutputStream (via reflection?)
  3. Run FileOutputStream.getFD().sync() on it

vy avatar Apr 13 '25 19:04 vy

Hello @vy Sorry for the delay I had that hack working before I submitted the ticket but to be absolutely sure I do not mislead you I wanted to get Ops deploy my test to Azure to in exactly the way our prod apps are

I just got a confirmation from them that it did solve the issue.

I did not do it in a background thread I instead copied rolling file appender and overwritten append() method where I used reflection to get the output stream on every write

So if it is feasible to add something like fileSystemSyncFrequencySec=60 to file based appenders (or at least the RollingFileAppender) to indicate that if present FileOutputStream.getFD().sync() should be called it would be great

We have a support ticket opened with microsoft on the issue but it does not seem to be getting anywhere. I can see if I can get the link to post here but not sure if it is publicly available

Thanks again for your prompt response! It would be hard to add this synch from outside of the log4j project source code - all appenders are final and builders are statically associated with them - lots of copying of your code just to get past protected members and final classes :-)

roytmana avatar Apr 14 '25 16:04 roytmana

@vy from my perspective it would probably be sufficient to do it inside of append() method checking on elapsed time since previous sync The disadvantage of course is that if a log does not get much traffic it will not be synched for some time and dates not flushed. Thread based sync would be much nicer as it would work as a heartbeat regardless of particular appender activity. You would need to have a list of all appenders configured for this heartbeat and do it for all of them though. In both cases you may need to synchronize normal appender write ops and call to sync - I do not know if calling them concurrently may pose any issues

roytmana avatar Apr 14 '25 17:04 roytmana

@vy Potentially file output stream could be opened using NIO with java.nio.file.StandardOpenOption.SYNC option. It probably (need to test) will do the trick but I am wondering if there will be any noticeable performance penalty since it'll happen on every write

I can test it as well (not with log4j but with a plain text file)

/**
 * Requires that every update to the file's content or metadata be written
 * synchronously to the underlying storage device.
 *
 * @see <a href="package-summary.html#integrity">Synchronized I/O file integrity</a>
 */
SYNC,

/**
 * Requires that every update to the file's content be written
 * synchronously to the underlying storage device.
 *
 * @see <a href="package-summary.html#integrity">Synchronized I/O file integrity</a>
 */
DSYNC;

roytmana avatar Apr 14 '25 17:04 roytmana

@roytmana, double checking: Does immediateFlush not help?

vy avatar Apr 16 '25 12:04 vy

Potentially file output stream could be opened using NIO with java.nio.file.StandardOpenOption.SYNC option. It probably (need to test) will do the trick but I am wondering if there will be any noticeable performance penalty since it'll happen on every write

I disagree, if the performance penalty was be negligible, that option would be on by default.

Microsoft will probably solve the problem sooner or later, so I see no reason to modify the behavior for all Log4j Core users, however we can:

  • implement #2117 or at least part of it. The usage of NIO has the advantage of using the same StandardOpenOptions on UNIX and Windows.
  • add a configuration option to all file appenders and rolling file appenders that accepts a comma-separated list of StandardOpenOptions.

Does immediateFlush not help?

That would be really strange: immediateFlush should only flush the Java buffers (mostly our buffers, but some implementations of OutputStream also have buffers), while the problem here seems to be related to OS buffers that are not synced with the underlying device.

ppkarwasz avatar Apr 17 '25 16:04 ppkarwasz

I just confirmed that immediateFlush does not work. I also tried just in case random rolling file appender did not help either

i tried SYNC and DSYNC on a text file (and it will require bunch of other options to be supplied like CREATE, APPEND etc) it does not appear to fix the issue but I may need to test it a bit more with new file vs appended existing etc if we want to give that a try

roytmana avatar Apr 17 '25 16:04 roytmana

@rm5248, @FreeAndNil,

Do you have similar issues in Log4cxx and Log4net on Azure?

ppkarwasz avatar Apr 17 '25 17:04 ppkarwasz

we are a java shop :-)

roytmana avatar Apr 17 '25 21:04 roytmana

we are a java shop :-)

Trying to figure out if this is Java specific.

ppkarwasz avatar Apr 17 '25 22:04 ppkarwasz

we are java and we run on linux (redhat 8) containers in azure Kubernetes. Unfortunately I have no way to test. if you could give me an executable I can see if we can run it from windows machine to azure file share but it'll be apple to oranges likely as is the OS/networking stack or maybe SMB file shares that are at fault

I appreciate very much your sticking with it because it is really not a log4j issue I was just hoping that an enhancement for periodic fsync would be helpful enough in general and non-intrusive enough to make it to the codebase :-)

roytmana avatar Apr 17 '25 22:04 roytmana

@rm5248, @FreeAndNil,

Do you have similar issues in Log4cxx and Log4net on Azure?

I feel like I might have seen something similar a long time ago, but I can't recall what it was. I definitely haven't heard of anything like this happening recently though. It sounds like an OS or JVM issue though, as metadata such as the last updated timestamp should be handled by the filesystem.

Interestingly it seems that NTFS may not update the last access timestamp immediately:

The NTFS file system delays updates to the last access time for a file by up to 1 hour after the last access.

That's probably not relevant here, unless the filesystem where logs are being written to is somehow NTFS.

rm5248 avatar Apr 17 '25 23:04 rm5248

@rm5248 , @ppkarwasz on NTFS I often see that the file size is not updated for log files. It looks like the log file is empty, but when I look into it, there is content that is older than the point I checked the file size. When I look again at the file size, it is now "correct". I assume this is an optimization by Microsoft.

Our few azure systems use azure analytics instead of log4net.

FreeAndNil avatar Apr 18 '25 05:04 FreeAndNil

In our case timestamp is not updated at all and after 60 days when archival process (external to the app) looks for old logs it hits on live open log files and delete them completely breaking logging - log4j would not recover and will continue to "write" to deleted file writing nothing and not re-creating the file

I do not know what actual file system is behind those azure file shares but they are SMB shares mounted in Red Hat ubi8 based PODS on Azure AKS

roytmana avatar Apr 18 '25 11:04 roytmana