logging-log4cxx
logging-log4cxx copied to clipboard
GZCompressAction: leave file uncompressed if fork of gzip fails
Prior to this patch, if apr_proc_create fails (e.g. because the parent process is very large and the system does not have enough free memory to create the child), logging effectively stops. With this patch, logging will continue, leaving the rotated file in an uncompressed form.
In practice, when a process gets this big, it's usually due to a memory leak, and having all the log output from this time period is well worth the extra disk space consumed due to lack of compression.
If we do this for the GZCompressAction
, a similar thing should also happen for the ZipCompressAction
.
@ams-tschoening Any thoughts as to the change? On one hand, I feel as though if there's an error, throwing an exception is good because it would hopefully be fail-fast at that point(if fork fails, what else can you do?). On the other hand, unexpected behavior in the logging framework(for example, by throwing an exception) is probably bad as well, since if the logging fails what can you do to debug?
In my opinion, exceptions from the log framework are fine and need to be handled like in any other cases. One needs to look at the root cause and fix/deal/live/... with it. The root cause in this example was a memory look and is pretty interesting, because in this special case one can't guarantee that any subsequent log statements or whatever will be processed successfully anyway. Things can fail for any reason in this special situation, so why even try to be tolerant?
OTOH, I had similar problems with simply an unavailable GZIP-binary in the past, because I tested things on Windows, which doesn't provide one by default. In this use case everything else works most likely and one could be tolerant in theory as well. Though, I configured to use compressed files, but my setup was broken to act accordingly and needed to be fixed. So th exception made me aware.
Besides that, why only be tolerant when GZIP could not be executed? Why not ignore errors about apr_file_close
as well? If we ignore a potential memory leak, one for handles/file descriptors could be ignored easily as well. What about the deletion of files? Especially in this case when deleting the target file which couldn't be created properly because GZIP didn't execute at all. On Windows deleting that file might fail for any reason, like the search indexer or some tools in Windows Explorer like TortoiseSVN/GIT locking that file.
To make a long story short: From my point of view, the question is not about exceptions or not in general, but about how tolerant the concrete compressing actions should be. And isn't that something the user should decide? So how about instead of ignoring one exception always and the other never etc., introduce some kind of config to deal with this? In the easiest case it's a boolean flag only telling if to be "tolerant" or not, whatever that means. In the context of this PR it means in case of false
, keep throwing the exception otherwise use the implementation of this PR. This can then be used in other problematic places and, in a far, far away future :-), even to some fully fledged policy-stuff, where one might configure to be tolerant to some special error codes when exec of GZIP fails and not to others.
In the end, we wouldn't need to discuss if to use exceptions or not, but focus on which operations can be made more "tolerant" under which circumstances and possibly document why, example use cases etc. It's a slightly different approach.
The root cause in this example was a memory look and is pretty interesting, because in this special case one can't guarantee that any subsequent log statements or whatever will be processed successfully anyway. Things can fail for any reason in this special situation, so why even try to be tolerant?
This brings up the next interesting point: fork()
may have failed because you have hit a resource limit, not that you have actually run out of memory.
For example, you can run out of processes that you could execute:
testuser@debian:~$ ulimit -u 2
testuser@debian:~$ sleep 5 &
[1] 56361
testuser@debian:~$ sleep 5
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: Interrupted system call
[1]+ Done sleep 5
In this particular case, you could probably continue on for a while, at least until you hit some other limit, which may or may not ever happen.
To make a long story short: From my point of view, the question is not about exceptions or not in general, but about how tolerant the concrete compressing actions should be. And isn't that something the user should decide? So how about instead of ignoring one exception always and the other never etc., introduce some kind of config to deal with this?
To tolerate errors like this, we can't configure the GZCompressAction
directly: it gets created by the FixedWindowRollingPolicy
and TimeBasedRollingPolicy
when the filename extension ends in .gz
. Currently if it fails, all that happens is that log4cxx prints out log4cxx: Exception during rollover
.
The best long-term approach would probably be to implement the gzip/zip algorithms in log4cxx to avoid problems with forking a new process, as well as eliminating the problem of needing to have zip/gzip installed on your system. This is a non-trivial amount of work though.
In the meantime, it seems like we have several options:
- Accept this patch as-is. It sounds like that's not what we want to do, since deciding on how tolerant the rollover is should be up to the user.
- Introduce some new kind of tolerance mechanism to allow for failures when executing gzip/zip
- Utilize the already-existing
ErrorHandler
mechanism to call an error handler when an exception happens on rollover. Currently, the only classes that do anything with theErrorHandler
interface areSyslogAppender
,FileAppender
,SMTPAppender
,ODBCAppender
, andWriterAppender
.
Since this error can happen in RollingFileAppender
, that leads me to believe that the correct way to fix this is to actually just make the RollingFileAppender
, DailyRollingFileAppender
smarter to call the error handler if the rollover fails. That should then allow you to define an error handler, which would then let you continue logging messages to an appender without worrying about the zip/gzip, which seems to be what @nturner is trying to do.
Side note: we seem to have two classes both name RollingFileAppender
in different namespaces.... I have no idea why.
When researching about how error handlers can be set+used etc., I found the following statement and associated question. In this case the author of the question expected some kind of exception as well:
So I agree with @rm5248, it makes sense to use existing error handler infrastructure in the affected appenders and let those decide how to deal with some.
https://stackoverflow.com/questions/34356804/log4cxx-try-catch-wont-work-on-domconfiguratorconfigure
It just logs and ignores errors. Just bad implementation.
https://stackoverflow.com/questions/34356804/log4cxx-try-catch-wont-work-on-domconfiguratorconfigure#comment56462193_34356804
So I agree with @rm5248, it makes sense to use existing error handler infrastructure in the affected appenders and let those decide how to proceed. If I'm understanding correctly, error handlers can simply be set as part of the appender
-configuration using errorHandler
-elements and even seem to support individual options. This should allow flexibility users needs, in the worst case by implementing custom error handlers.
https://github.com/apache/logging-log4cxx/blob/79352bd0160866d06b61414f2dc83d1f71b44335/src/main/cpp/domconfigurator.cpp#L244 https://github.com/apache/logging-log4cxx/blob/79352bd0160866d06b61414f2dc83d1f71b44335/src/main/cpp/domconfigurator.cpp#L316
This seems like a good chance to make implementations like RollingFileAppenderSkeleton::rolloverInternal
a bit ore readable by using early aborts etc. as well.
I've created a JIRA issue to track this: https://issues.apache.org/jira/browse/LOGCXX-523
I should be able to take a look this weekend. I've also created another issue to track other errors that are effectively silenced if things go wrong.
@nturner How did you configure using compression, programmatically or using XML configs? In the latter case, please provide your example here or in linked JIRA. Thanks!
https://lists.apache.org/thread.html/r3452a23f48a2c2e2bd568c946a5d241c7c57607be620cb1602a9ea0c%40%3Cdev.logging.apache.org%3E
Here's the minimal test config I used when validating this change (by hand):
$ cat src/examples/cpp/trace-rolling.xml
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
On Sat, Apr 17, 2021 at 4:20 AM Thorsten Schöning @.***> wrote:
@nturner https://github.com/nturner How did you configure using compression, programmatically or using XML configs? In the latter case, please provide your example here or in linke JIRA https://issues.apache.org/jira/browse/LOGCXX-523. Thanks!
https://lists.apache.org/thread.html/r3452a23f48a2c2e2bd568c946a5d241c7c57607be620cb1602a9ea0c%40%3Cdev.logging.apache.org%3E
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/logging-log4cxx/pull/62#issuecomment-821788124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTDBNP6F7KFG7QPEMOWCLTJFAEXANCNFSM4235C4MA .
Here's the patch I made to the the "trivial" example test program to facilitate testing:
diff --git a/src/examples/cpp/trivial.cpp b/src/examples/cpp/trivial.cpp index 165c7ceb..c2f86681 100644 --- a/src/examples/cpp/trivial.cpp +++ b/src/examples/cpp/trivial.cpp @@ -17,21 +17,31 @@ #include <log4cxx/logstring.h> #include <stdlib.h> #include <log4cxx/logger.h> +#include <log4cxx/xml/domconfigurator.h> #include <log4cxx/basicconfigurator.h> #include <log4cxx/helpers/exception.h> #include <log4cxx/ndc.h> #include <locale.h>
using namespace log4cxx; +using namespace log4cxx::xml; using namespace log4cxx::helpers;
-int main() +int main(int argc, char **argv) { setlocale(LC_ALL, ""); int result = EXIT_SUCCESS; try {
-
BasicConfigurator::configure();
-
if (argc > 1)
-
{
-
// BasicConfigurator replaced with
PropertyConfigurator.
-
DOMConfigurator::configure(argv[1]);
-
}
-
else
-
{
-
BasicConfigurator::configure();
-
} LoggerPtr rootLogger = Logger::getRootLogger(); NDC::push("trivial context");
On Sat, Apr 17, 2021 at 4:43 PM Nathaniel W. Turner < @.***> wrote:
Here's the minimal test config I used when validating this change (by hand):
$ cat src/examples/cpp/trace-rolling.xml
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy"> </rollingPolicy> <triggeringPolicy class="log4cxx.rolling.SizeBasedTriggeringPolicy"> </triggeringPolicy> On Sat, Apr 17, 2021 at 4:20 AM Thorsten Schöning < @.***> wrote:
@nturner https://github.com/nturner How did you configure using compression, programmatically or using XML configs? In the latter case, please provide your example here or in linke JIRA https://issues.apache.org/jira/browse/LOGCXX-523. Thanks!
https://lists.apache.org/thread.html/r3452a23f48a2c2e2bd568c946a5d241c7c57607be620cb1602a9ea0c%40%3Cdev.logging.apache.org%3E
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/logging-log4cxx/pull/62#issuecomment-821788124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTDBNP6F7KFG7QPEMOWCLTJFAEXANCNFSM4235C4MA .
Sorry, that was an older patch. Here's the same commit on my local 0.11.0 branch:
diff --git a/src/examples/cpp/trivial.cpp b/src/examples/cpp/trivial.cpp index 6e8726e2..f3b31e65 100644 --- a/src/examples/cpp/trivial.cpp +++ b/src/examples/cpp/trivial.cpp @@ -17,21 +17,30 @@ #include <log4cxx/logstring.h> #include <stdlib.h> #include <log4cxx/logger.h> +#include <log4cxx/xml/domconfigurator.h> #include <log4cxx/basicconfigurator.h> #include <log4cxx/helpers/exception.h> #include <log4cxx/ndc.h> #include <locale.h>
using namespace log4cxx; +using namespace log4cxx::xml; using namespace log4cxx::helpers;
-int main() +int main(int argc, char **argv) { setlocale(LC_ALL, ""); int result = EXIT_SUCCESS; try {
-
BasicConfigurator::configure();
-
if (argc > 1)
-
{
-
DOMConfigurator::configure(argv[1]);
-
}
-
else
-
{
-
BasicConfigurator::configure();
-
} LoggerPtr rootLogger = Logger::getRootLogger(); NDC::push("trivial context");
On Sat, Apr 17, 2021 at 4:44 PM Nathaniel W. Turner < @.***> wrote:
Here's the patch I made to the the "trivial" example test program to facilitate testing:
diff --git a/src/examples/cpp/trivial.cpp b/src/examples/cpp/trivial.cpp index 165c7ceb..c2f86681 100644 --- a/src/examples/cpp/trivial.cpp +++ b/src/examples/cpp/trivial.cpp @@ -17,21 +17,31 @@ #include <log4cxx/logstring.h> #include <stdlib.h> #include <log4cxx/logger.h> +#include <log4cxx/xml/domconfigurator.h> #include <log4cxx/basicconfigurator.h> #include <log4cxx/helpers/exception.h> #include <log4cxx/ndc.h> #include <locale.h>
using namespace log4cxx; +using namespace log4cxx::xml; using namespace log4cxx::helpers;
-int main() +int main(int argc, char **argv) { setlocale(LC_ALL, ""); int result = EXIT_SUCCESS; try {
BasicConfigurator::configure();
if (argc > 1)
{
// BasicConfigurator replaced with
PropertyConfigurator.
DOMConfigurator::configure(argv[1]);
}
else
{
BasicConfigurator::configure();
} LoggerPtr rootLogger = Logger::getRootLogger(); NDC::push("trivial context");
On Sat, Apr 17, 2021 at 4:43 PM Nathaniel W. Turner < @.***> wrote:
Here's the minimal test config I used when validating this change (by hand):
$ cat src/examples/cpp/trace-rolling.xml
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy"> </rollingPolicy> <triggeringPolicy class="log4cxx.rolling.SizeBasedTriggeringPolicy"> </triggeringPolicy> On Sat, Apr 17, 2021 at 4:20 AM Thorsten Schöning < @.***> wrote:
@nturner https://github.com/nturner How did you configure using compression, programmatically or using XML configs? In the latter case, please provide your example here or in linke JIRA https://issues.apache.org/jira/browse/LOGCXX-523. Thanks!
https://lists.apache.org/thread.html/r3452a23f48a2c2e2bd568c946a5d241c7c57607be620cb1602a9ea0c%40%3Cdev.logging.apache.org%3E
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/logging-log4cxx/pull/62#issuecomment-821788124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTDBNP6F7KFG7QPEMOWCLTJFAEXANCNFSM4235C4MA .
@nturner Can you take a look at #64 and see if that works for you? I've tested it on my end.
@rm5248 sorry for the delayed reply. I just tried my test case with that change (I see that it's in master now). No, it does not work for me.
Below are the results from a re-test against the current master branch. In both cases, I used a logging config that wants to rotate after basically every message (so every second with a time based rolling policy) and let the "delayedloop" example test program run for about 10 seconds before stopping it with Ctrl-C. I also hacked gzcompressaction.cpp
to skip the call to apr_proc_create
and just do stat = ENOMEM
instead to simulate a failure to fork the gzip
process. (Btw, I agree that a nice longer-term solution here would be to do the gzip compression in-process with a library instead of forking.)
Test with just the patch from https://github.com/apache/logging-log4cxx/pull/64
See https://github.com/nturner/logging-log4cxx/commits/nturner/master-graceful-gzip-failure-pr-64 for code used in this test.
Behavior: The failed .gz file is left around, and logging stops working.
14:08:41 nturner@millhouse4(cpp (nturner/master-graceful-gzip-failure))
$ ./delayedloop trace-rolling.xml
log4cxx: Exception during rollover
log4cxx: IO Exception : status code = 12
log4cxx: IO Exception : status code = 12
^C
14:08:51 nturner@millhouse4(cpp (nturner/master-graceful-gzip-failure))
$ ll
total 452
drwxrwxr-x 7 nturner nturner 4096 Apr 28 13:39 CMakeFiles
-rw-rw-r-- 1 nturner nturner 1312 Apr 28 13:36 CMakeLists.txt
-rw-rw-r-- 1 nturner nturner 337 Apr 28 13:39 CTestTestfile.cmake
-rw-rw-r-- 1 nturner nturner 14734 Apr 28 13:39 Makefile
-rw-rw-r-- 1 nturner nturner 20079 Dec 29 11:25 Makefile.in
-rw-rw-r-- 1 nturner nturner 1152 Apr 28 13:39 cmake_install.cmake
-rwxrwxr-x 1 nturner nturner 52744 Apr 28 14:08 console
-rw-rw-r-- 1 nturner nturner 4045 Apr 28 12:05 console.cpp
-rwxrwxr-x 1 nturner nturner 61528 Apr 28 14:08 custom-appender
-rw-rw-r-- 1 nturner nturner 2370 Apr 28 13:36 custom-appender.cpp
-rw-rw-r-- 1 nturner nturner 740 Apr 28 13:36 custom-appender.xml
-rwxrwxr-x 1 nturner nturner 36496 Apr 28 14:08 delayedloop
-rw-rw-r-- 1 nturner nturner 3442 Apr 28 12:05 delayedloop.cpp
-rw-rw-r-- 1 nturner nturner 2029 Apr 28 13:36 format-string.cpp
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:08 rolling-file.2021-04-28-14-08-44.log
-rw-rw-r-- 1 nturner nturner 0 Apr 28 14:08 rolling-file.2021-04-28-14-08-44.log.gz
-rwxrwxr-x 1 nturner nturner 31184 Apr 28 14:08 stream
-rw-rw-r-- 1 nturner nturner 1951 Sep 9 2020 stream.cpp
-rw-rw-r-- 1 nturner nturner 567 Apr 28 14:05 trace-console.xml
-rw-rw-r-- 1 nturner nturner 896 Apr 28 14:08 trace-rolling.xml
-rwxrwxr-x 1 nturner nturner 35456 Apr 28 14:08 trivial
-rw-rw-r-- 1 nturner nturner 2042 Apr 28 14:05 trivial.cpp
-rw-rw-r-- 1 nturner nturner 141208 Dec 29 11:26 trivial.o
Test with the patch from https://github.com/apache/logging-log4cxx/pull/62 (this PR)
See https://github.com/nturner/logging-log4cxx/commits/nturner/master-graceful-gzip-failure-pr-62 for code used in this test.
Behavior: The failed .gz file is removed cleanly (leaving just the rotated, but uncompressed log), and logging continues working.
14:17:14 nturner@millhouse4(cpp (nturner/master-graceful-gzip-failure-pr-62))
$ ./delayedloop trace-rolling.xml
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
log4cxx: Failed to fork gzip during log rotation; leaving log file uncompressed
^C
14:17:23 nturner@millhouse4(cpp (nturner/master-graceful-gzip-failure-pr-62))
$ ll
total 480
drwxrwxr-x 7 nturner nturner 4096 Apr 28 13:39 CMakeFiles
-rw-rw-r-- 1 nturner nturner 1312 Apr 28 13:36 CMakeLists.txt
-rw-rw-r-- 1 nturner nturner 337 Apr 28 13:39 CTestTestfile.cmake
-rw-rw-r-- 1 nturner nturner 14734 Apr 28 13:39 Makefile
-rw-rw-r-- 1 nturner nturner 20079 Dec 29 11:25 Makefile.in
-rw-rw-r-- 1 nturner nturner 1152 Apr 28 13:39 cmake_install.cmake
-rwxrwxr-x 1 nturner nturner 52744 Apr 28 14:16 console
-rw-rw-r-- 1 nturner nturner 4045 Apr 28 12:05 console.cpp
-rwxrwxr-x 1 nturner nturner 61528 Apr 28 14:16 custom-appender
-rw-rw-r-- 1 nturner nturner 2370 Apr 28 13:36 custom-appender.cpp
-rw-rw-r-- 1 nturner nturner 740 Apr 28 13:36 custom-appender.xml
-rwxrwxr-x 1 nturner nturner 36496 Apr 28 14:16 delayedloop
-rw-rw-r-- 1 nturner nturner 3442 Apr 28 12:05 delayedloop.cpp
-rw-rw-r-- 1 nturner nturner 2029 Apr 28 13:36 format-string.cpp
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-16.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-17.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-18.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-19.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-20.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-21.log
-rw-rw-r-- 1 nturner nturner 67 Apr 28 14:17 rolling-file.2021-04-28-14-17-22.log
-rwxrwxr-x 1 nturner nturner 31184 Apr 28 14:16 stream
-rw-rw-r-- 1 nturner nturner 1951 Sep 9 2020 stream.cpp
-rw-rw-r-- 1 nturner nturner 567 Apr 28 14:05 trace-console.xml
-rw-rw-r-- 1 nturner nturner 896 Apr 28 14:08 trace-rolling.xml
-rwxrwxr-x 1 nturner nturner 35456 Apr 28 14:16 trivial
-rw-rw-r-- 1 nturner nturner 2042 Apr 28 14:05 trivial.cpp
-rw-rw-r-- 1 nturner nturner 141208 Dec 29 11:26 trivial.o
Ah, it looks like you haven't updated the config. If you use a config file similar to this one on JIRA,, that should let the error handler run.
Basically what the config does is create a RollingFileAppender
that will rollover and gzip the contents of the log file. If it fails, it will call the specified FallbackErrorHandler
, which is configured exactly the same, except it won't gzip the files. The rollover is a little funny still(oddly sized files, the gzip file still exists), but it does continue logging in my testing.
Sorry; been busy. Yes, @rm5248, if I use a configuration like the example in LOGCXX-523, logging appears to continue. That approach results in a more complex configuration (2x the number of appenders) and the failure behavior isn't as graceful.
Since the specific conditions where fork() is likely to fail are pretty well understood and the appropriate behavior in the context of GZCompressAction is pretty straightforward, I think I'll stick with the approach in this PR in my own local builds of log4cxx. Perhaps it's not good enough for upstream -- that's fine. :)
Sorry; been busy. Yes, @rm5248, if I use a configuration like the example in LOGCXX-523, logging appears to continue. That approach results in a more complex configuration (2x the number of appenders) and the failure behavior isn't as graceful.
Since the specific conditions where fork() is likely to fail are pretty well understood and the appropriate behavior in the context of GZCompressAction is pretty straightforward, I think I'll stick with the approach in this PR in my own local builds of log4cxx. Perhaps it's not good enough for upstream -- that's fine. :)
Thanks for the input. Unfortunately, the code as it exists today is somewhat resistant to changing the current behavior, and really needs better error handling in general(but that's a major change).
I think what you have provided is good in certain circumstances, so I haven't closed this for that reason. I've been thinking about adding in what you have as a configuration option, but I really want to get better error handling working first before changing the default behavior.
For reference, this has spawned 3 issues: https://issues.apache.org/jira/browse/LOGCXX-523 https://issues.apache.org/jira/browse/LOGCXX-524 https://issues.apache.org/jira/browse/LOGCXX-525
This has now been added as part of #164 as an option that can be set via the config file.
Personally I would recommend using the FallbackErrorHandler
to change the behavior when required, or perhaps implementing your own error handler that could perform some sort of action when the error occurs.