duc icon indicating copy to clipboard operation
duc copied to clipboard

json output not properly quoted

Open baikal opened this issue 10 months ago • 7 comments

duc json somepath

emits json where at least " and are not properly quoted by in the json, which makes json invalid ...

baikal avatar Apr 04 '25 11:04 baikal

"baikal" == baikal @.***> writes:

duc json somepath emits json where at least " and are not properly quoted by in the json, which makes json invalid ...

Can you please give me more details and I'll see if I can come up with a patch against v1.5.0-rc1, especially since I'm not up on all the JSON requirements directly.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.*Message ID: <zevv/duc/issues/340 @github.com>

baikalbaikal created an issue (zevv/duc#340)

duc json somepath

emits json where at least " and are not properly quoted by in the json, which makes json invalid ...

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.*Message ID: <zevv/duc/issues/340 @github.com>

l8gravely avatar Apr 04 '25 17:04 l8gravely

The following patch solves it for " (double quote), TAB, CR and LF characters. Although there might be more (all < ascii 32 ?).

diff --git a/src/duc/cmd-json.c b/src/duc/cmd-json.c
index be26851..d257ed4 100644
--- a/src/duc/cmd-json.c
+++ b/src/duc/cmd-json.c
@@ -33,10 +33,10 @@ static void print_escaped(const char *s)
 {
        while(*s) {
                switch(*s) {
-                       case '"': printf("\""); break;
-                       case '\t': putchar('\t'); break;
-                       case '\n': putchar('\n'); break;
-                       case '\r': putchar('\r'); break;
+                        case '"': printf("\\\""); break;
+                        case '\t': printf("\\t"); break;
+                        case '\n': printf("\\n"); break;
+                        case '\r': printf("\\r"); break;
                        default: putchar(*s); break;
                }
                s++;

baikal avatar Apr 04 '25 17:04 baikal

wait, make that the following, according to here:

diff --git a/src/duc/cmd-json.c b/src/duc/cmd-json.c
index be26851..d8c4fa1 100644
--- a/src/duc/cmd-json.c
+++ b/src/duc/cmd-json.c
@@ -33,10 +33,13 @@ static void print_escaped(const char *s)
 {
        while(*s) {
                switch(*s) {
-                       case '"': printf("\""); break;
-                       case '\t': putchar('\t'); break;
-                       case '\n': putchar('\n'); break;
-                       case '\r': putchar('\r'); break;
+                        case '"': printf("\\\""); break;
+                        case '\\': printf("\\\\"); break;
+                        case '\t': printf("\\t"); break;
+                        case '\n': printf("\\n"); break;
+                        case '\r': printf("\\r"); break;
+                        case '\b': printf("\\b"); break;
+                        case '\f': printf("\\f"); break;
                        default: putchar(*s); break;
                }
                s++;

baikal avatar Apr 04 '25 17:04 baikal

testfiles.zip

baikal avatar Apr 04 '25 17:04 baikal

haha yes the original escaping was just utter nonsense, i wonder how we let that creep in without noticing. thanks for reporting!

zevv avatar Apr 04 '25 18:04 zevv

"Ico" == Ico Doornekamp @.***> writes:

haha yes the original escaping was just utter nonsense, i wonder how we let that creep in without noticing. thanks for reporting!

I wonder how... anyway, thanks for the patches and the test cases! I wonder if we should try using 'jsonlint' or some other lint-like tool for doing some more validations here?

As you can guess, I haven't had a chance to look at this, but I will work on applying the patch this weekend and getting out a new release if I can find the time.

1.5.0-rc2 I suspect. But with some big holes still.

l8gravely avatar Apr 05 '25 00:04 l8gravely

Quoting John (2025-04-05 02:38:30)

l8gravely left a comment (zevv/duc#340)

"Ico" == Ico Doornekamp @.***> writes:

haha yes the original escaping was just utter nonsense, i wonder how we let that creep in without noticing. thanks for reporting!

I wonder how...

My apologies if that sounded a lot like I was blaming someone else for not noticing this. The thing is, it seems that I reviewed and merged this myself on july 23d, 2022, so that perfectly explains the "how" :)

https://www.json.org/json-en.html is pretty clear about the proper way to handle this, so I propose this so we also handle the control characters:

static void print_escaped(const char *s)
{
    const char *p = s;

    while(*p) {
             if(*p == '"' ) printf("\\\"");
        else if(*p == '\\') printf("\\\\");
        else if(*p == '\b') printf("\\b");
        else if(*p == '\f') printf("\\f");
        else if(*p == '\n') printf("\\n");
        else if(*p == '\r') printf("\\r");
        else if(*p == '\t') printf("\\t");
        else if(*p < 0x20) printf("\\u%04x", *p);
        else putchar(*p);
        p++;
    }
}

We might still be lacking on the UTF-8 side, but we can always pick that up if we run into it later.

anyway, thanks for the patches and the test cases! I wonder if we should try using 'jsonlint' or some other lint-like tool for doing some more validations here?

That makes sense, good plan.

zevv avatar Apr 05 '25 05:04 zevv