logging-log4cxx icon indicating copy to clipboard operation
logging-log4cxx copied to clipboard

logchar is defined as char in Windows environments

Open WorldRobertProject opened this issue 8 months ago • 5 comments

Though char is not UTF-8 in Windows environments, LOG4CXX_CHAR:STRING=utf-8 is set and logchar is defined as char, when building with vcpkg. In Japanese Windows environments, char is Shift_JIS. Despite claiming UTF-8, log messages are output as-is without any conversion, so Japanese messages are correctly logged in Shift_JIS, but Unicode-specific characters cannot be logged.

We think LOG4CXX_CHAR:STRING should be set wchar_t and logchar should be defined as wchar_t in Windows environments.

version: 1.4.0 vcpkg tag: 2025.04.09

WorldRobertProject avatar Apr 16 '25 08:04 WorldRobertProject

It looks like this changed here: https://github.com/apache/logging-log4cxx/commit/45077480eb6af9bb222c554cad7e2c02d1951fc6

You should be able to work around this by setting LOG4CXX_CHAR=wchar_t when configuring, although I'm not sure how to configure that with vcpkg at the moment.

rm5248 avatar Apr 17 '25 15:04 rm5248

It looks like this changed here: 4507748

You should be able to work around this by setting LOG4CXX_CHAR=wchar_t when configuring, although I'm not sure how to configure that with vcpkg at the moment.

It seems that the way to fix this is for you to use an overlay port in vcpkg, as we should not use features to implement alternatives. There are some directions on installing locally modified dependencies where you can set the LOG4CXX_CHAR option in the vcpkg port file.

@swebb2066 I don't think there's much for us to do here, this seems like a rather specific to Japanese. From the wikipedia page it sounds like Shift_JIS is not UTF-8 nor UTF-16, but a different 16-bit encoding entirely. I think the best thing for us to do here would be to provide an example overlay file for vcpkg to let users set their encoding instead of trying to figure it out on our own.

rm5248 avatar Apr 18 '25 01:04 rm5248

but Unicode-specific characters cannot be logged

@WorldRobertProject please provide a code snippet that shows how you would like to log Unicode-specific characters.

Are you using Qt?

Can you to advise if I should apply this patch to Log4cxx:

diff --git a/src/main/include/log4cxx-qt/transcoder.h b/src/main/include/log4cxx-qt/transcoder.h
index 0b1aaab4..43ad6447 100644
--- a/src/main/include/log4cxx-qt/transcoder.h
+++ b/src/main/include/log4cxx-qt/transcoder.h
@@ -31,7 +31,7 @@
        @param src The QString variable.
 */
 #define LOG4CXX_DECODE_QSTRING(var, src) \
-       LOG4CXX_NS::LogString var = (src).toStdString()
+       LOG4CXX_NS::LogString var = (src).toUtf8().constData()

 /** Create a QString equivalent of \c src.

@@ -43,7 +43,7 @@
        @param src The log4cxx::LogString variable.
 */
 #define LOG4CXX_ENCODE_QSTRING(var, src) \
-       QString var = QString::fromStdString(src)
+       QString var = QString::fromUtf8(src.c_str())
 #endif // LOG4CXX_LOGCHAR_IS_UTF8

 #if LOG4CXX_LOGCHAR_IS_WCHAR

swebb2066 avatar Apr 18 '25 07:04 swebb2066

@rm5248 For now, this issue has been resolved by the overlay port. Thank you so much.

I realized that there is still an issue caused by character encoding. The character type for file names in LocationInfo is char. File names that contain non-ASCII characters are not correctly logged. We think the character type for file names should be logchar or another typedef type. That said, file names with non-ASCII characters are basically never used, so it will be an issue as long as only short file names are logged. (Directory names may contain non-ASCII chacacters.)

The character type for function names being 'char' is maybe not an issue, but the current C++ standard allows function names to include non-ASCII characters. (while it is not common...) It might be better changed to logchar or another typedef type, too, but the severity may be low.

@swebb2066 Sorry, we don't use Qt.

Because conversion from Shift_JIS to UTF-8 typically goes through UTF-16, UTF-16 is more efficient than UTF-8.

We need to integrate with C# in our custom appender, and C#'s char is UTF-16. This is another reason why we want to use wchar_t.

WorldRobertProject avatar Apr 18 '25 07:04 WorldRobertProject

Here's how we use an overlay port:

  1. Copy vcpkg\ports\log4cxx to another place
  2. Add an option -DLOG4CXX_CHAR=wchar_t to vcpkg_cmake_configure in the copied portfile.cmake
  3. Build as vcpkg install log4cxx --overlay-ports=<copied folder path>

WorldRobertProject avatar Apr 18 '25 08:04 WorldRobertProject