librdb icon indicating copy to clipboard operation
librdb copied to clipboard

Wrong encoding of non-ASCII characters in JSON

Open jscissr opened this issue 3 months ago • 3 comments

Non-ASCII characters are encoded incorrectly by rdb-cli dump.rdb json.

Example: Add a key to redis with SET demo "Müller". Run rdb-cli dump.rdb json. The result is:

[{
    "demo":"M\u00c3\u00bcller"
}]

After unescaping, we get "Müller".

For comparison, rdbtools (which does not work with newer redis versions) outputs:

[{
"demo":"M\u00fcller"}]

The simplest way to fix this is to avoid escaping non-ASCII characters entirely, and output them as is:

diff --git a/src/ext/handlersToJson.c b/src/ext/handlersToJson.c
index c5addf7..7a7e299 100644
--- a/src/ext/handlersToJson.c
+++ b/src/ext/handlersToJson.c
@@ -65,7 +65,7 @@ static void outputPlainEscaping(RdbxToJson *ctx, char *p, size_t len) {
             case '\t': fprintf(ctx->outfile, "\\t"); break;
             case '\b': fprintf(ctx->outfile, "\\b"); break;
             default:
-                fprintf(ctx->outfile, (isprint(*p)) ? "%c" : "\\u%04x", (unsigned char)*p);
+                fprintf(ctx->outfile, ((unsigned char)*p > 127 || isprint(*p)) ? "%c" : "\\u%04x", (unsigned char)*p);
         }
         p++;
     }

With this change, the result is:

[{
    "demo":"Müller"
}]

jscissr avatar Mar 12 '24 14:03 jscissr