rabbitmq-server
rabbitmq-server copied to clipboard
Ensure that strings from external sources eventually end up as utf-8 encoded binaries
This is necessary since cmd.exe uses ISO-8859-1 encoding and directories can have latin1 characters, like RabbitMQ Sérvér
Follow-up to #5486
Discovered by @dumbbell
AFAIK, Windows enforces filename encoding to UTF-16 (LE?) on disk. Our use of cmd.exe is the source of the problem because I think it uses legacy encodings like ISO-8859-1(5) on my french Windows to maintain backward compatibility. Therefore I suspect it performs string conversions.
Linux and other similar Unix don't enforce anything. Distributions all default to a *.UTF-8 locale and most applications expect to deal with UTF-8-encoded filenames. But non-UTF-8-encoded filenames are still possible (while still using the default locale) and applications will display a "square character" or question mark when they fail to decode that non-UTF-8 string at best, or stop working.
Erlang strings are supposed to be lists of Unicode codepoints. I think RabbitMQ should convert input strings to Unicode the earliest possible in key places, like when it reads the output of rabbitmq-env-conf.bat and expect to work with Unicode strings and UTF-8-encoded binaries everywhere internally.
Therefore to solve the problem with the JSON encoding crash, I would prefer to convert whatever we try to JSON-encode at the time we get/read that string instead. In the end, there should be nothing to convert on Linux and a few strings we get because of our use of cmd.exe on Windows.
@michaelklishin I'll check that out, thanks.
@dumbbell I still got the encoding error with the fix for the output of rabbitmq-env-conf.bat. I'll see if I can figure out how that latin1 string made it to the JSON encoder.
@michaelklishin that bug is fixed. [] was incorrectly being passed as options to thoas.
Thanks @dumbbell. I will continue reviewing our use of list_to_binary/1 to see if there are other places that should be updated. I will also test using the data given in https://github.com/rabbitmq/rabbitmq-server/discussions/5805
cc @michaelklishin
Currently testing with RABBITMQ_BASE set to C:\Users\bakkenl\Евгений
Aha!
2022-09-21T09:58:43.174000-07:00 info: FORMATTER CRASH: {"Stopping message store for directory '~s'",[[99,58,47,85,115,101,114,115,47,98,97,107,107,101,110,108,47,1045,1074,1075,1077,1085,1080,1081,47,100,98,47,114,97,98,98,105,116,64,98,97,107,107,101,110,108,45,122,48,49,45,109,110,101,115,105,97,47,109,115,103,95,115,116,111,114,101,115,47,118,104,111,115,116,115,47,57,69,82,72,66,81,72,57,52,72,52,83,54,89,55,68,90,66,49,77,88,73,78,67,48,47,109,115,103,95,115,116,111,114,101,95,112,101,114,115,105,115,116,101,110,116]]}
Running into cuttlefish errors as well when setting configuration paths using unicode:
OK to keep the scope reasonable for this PR, I think I will deal with the format crash above, and leave the other unicode issues to the future, since I think fixes in cuttlefish will be required.
@dumbbell @michaelklishin - I fixed all the format strings dealing with vhost names, since I tested with a vhost named RabbitMQ Sérvér/Евгений (with tags Евгений, foo bar, Sérvér).
My guess is that all ~s and ~p format strings in the codebase can be changed to ~ts and ~tp.
In order to see the unicode correctly in the werl.exe window, I had to set the following:
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS=+pc unicode
I of course set the following as a system env var to test everything out:
RABBITMQ_BASE=C:\ProgramData\RabbitMQ Sérvér\Евгений
Config files from that location: Евгений.zip
PS I used this branch to fix some of ra's output - https://github.com/rabbitmq/ra/pull/305
I'm going to do a similar set of tests on Linux
Here is how I tested on Linux:
- Create a generic unix package from this branch
- Create a directory like
/tmp/RabbitMQ Sérvér - Евгений - Copy the configuration files from here into that directory. You'll have to edit
rabbitmq-env.confto point the base dir to whatever you created above. - Note that unicode is still not allowed in
rabbitmq.conf, due to a bug in cuttlefish. I'm going to work on that next. - Start the generic unix package this way:
RABBITMQ_CONF_ENV_FILE="/path/to/rabbitmq-env.conf" ./sbin/rabbitmq-plugins enable rabbitmq_management rabbitmq_top LOG=debug RABBITMQ_ALLOW_INPUT=true RABBITMQ_CONF_ENV_FILE="/path/to/rabbitmq-env.conf" ./sbin/rabbitmq-server - Investigate the output and log files to ensure there aren't any lists of integers or mis-encoded unicode strings
- Click around the management UI to ensure no 500s are returned, and that strings look correct
- Create a vhost named using strings with non-ASCII, non latin characters, like
RabbitMQ Sérvér/Евгений. I also usedЕвгенийfor the description and a tag value likeЕвгений, foo bar baz , Sérvér, which correctly ends up as three tags:$ ./sbin/rabbitmqctl list_vhosts name tags Listing vhosts ... name tags / [] RabbitMQ Sérvér/Евгений [Евгений, foo bar baz, Sérvér]
I run into the following exception when following the QA instructions. First, here is my /tmp/Евгений/rabbitmq-env.conf which I simplified since the exception seems to happen when the node attempts to load and parse it:
ADVANCED_CONFIG_FILE="/tmp/Евгений/advanced.config"
CONFIG_FILE="/tmp/Евгений/rabbitmq.conf"
ENABLED_PLUGINS_FILE="/tmp/Евгений/enabled_plugins"
LOG_BASE="/tmp/Евгений/log"
MNESIA_BASE="/tmp/Евгений/db"
SERVER_ADDITIONAL_ERL_ARGS='+pc unicode'
Thanks @michaelklishin I'll check into that!
@michaelklishin I'm not exactly sure how, but commit https://github.com/rabbitmq/rabbitmq-server/pull/5551/commits/3c1c7ada5898b8512642af2faf203a0da6e6b976 appears to have fixed the issue. I made those changes to get some additional output as to what the problem was in unicode:characters_to_list/1 but the issue seems to be resolved 🤔
Let me know if it works in your env, thanks!
2022-09-24 12:25:36.792985-07:00 [debug] <0.140.0> Sourcing $RABBITMQ_CONF_ENV_FILE: /tmp/Евгений/rabbitmq-env.conf
2022-09-24 12:25:36.796221-07:00 [debug] <0.140.0> $RABBITMQ_CONF_ENV_FILE exit status: 0
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> $RABBITMQ_CONF_ENV_FILE output:
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + . /tmp/Евгений/rabbitmq-env.conf
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + ADVANCED_CONFIG_FILE=/tmp/Евгений/advanced.config
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + CONFIG_FILE=/tmp/Евгений/rabbitmq.conf
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + ENABLED_PLUGINS_FILE=/tmp/Евгений/enabled_plugins
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + LOG_BASE=/tmp/Евгений/log
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + MNESIA_BASE=/tmp/Евгений/db
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + SERVER_ADDITIONAL_ERL_ARGS=+pc unicode
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + echo -----VARS-PID-2296543-----
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> -----VARS-PID-2296543-----
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> + set
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> ADVANCED_CONFIG_FILE='/tmp/Евгений/advanced.config'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> ASDF_DIR='/home/lbakken/.asdf'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> BINDIR='/home/lbakken/.asdf/installs/erlang/25.1/erts-13.1/bin'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> CLOUDSDK_PYTHON='/usr/bin/python'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> CLOUDSDK_PYTHON_ARGS='-S'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> CLOUDSDK_ROOT_DIR='/opt/google-cloud-sdk'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> CONFIG_FILE='/tmp/Евгений/rabbitmq.conf'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> CONF_ENV_FILE_PHASE='rabbtimq-prelaunch'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/1000/bus'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> DOCKER_BUILDKIT='1'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> DOTNET_BUNDLE_EXTRACT_BASE_DIR='/home/lbakken/.cache/dotnet_bundle_extract'
2022-09-24 12:25:36.796635-07:00 [debug] <0.140.0> DOTNET_ROOT='/usr/share/dotnet'
Yup, it works as expected now 👍
Finally backported for 3.11.5.