jsmn icon indicating copy to clipboard operation
jsmn copied to clipboard

Iterating over even more complex json objects

Open dieggsy opened this issue 5 years ago • 5 comments

simple.c shows how to iterate over jsmn objects that contain JSMN_ARRAY - how does one iterate over jsmn objects where values are more JSMN_OBJECTs?

The .size trick shown with the arrays doesn't work for JSMN_OBJECT, because .size for objects is the number of key/value pairs in the current level object, with no knowledge of the size of the inner objects.

dieggsy avatar Aug 10 '19 15:08 dieggsy

Thinking about it more, I feel like this is almost the definition of the sort of thing you need recursion for... I might get there eventually.

dieggsy avatar Aug 10 '19 16:08 dieggsy

I've whipped something up to this effect:

#define JSMN_STRICT
#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include "jsmn.h"

static int jsoneq(const char *json, jsmntok_t *tok, const char *s) {
  if (tok->type == JSMN_STRING && (int)strlen(s) == tok->end - tok->start &&
      strncmp(json + tok->start, s, tok->end - tok->start) == 0) {
    return 0;
  }
  return -1;
}

static void json_print(const char* jstr, jsmntok_t *json) {
    if (json == NULL) {
        printf("Null!\n");
    } else {
        printf("%.*s\n", json[0].end - json[0].start, jstr+json[0].start);
    }
}

int skip_object (jsmntok_t *json) {
    if (json->type == JSMN_PRIMITIVE
        || json->type == JSMN_STRING) {
        return 1;
    }
    else {
        int i = 1, skip = 0;
        int size = json->type == JSMN_ARRAY ? json->size : json->size * 2;
        while (i <= size ) {
            if (json[i].type == JSMN_OBJECT
                || json[i].type == JSMN_ARRAY) {
                int local_skip = skip_object(&json[i]);
                skip += local_skip;
                i += local_skip;
                size += local_skip;
            } else {
                skip++;
                i++;
            }
        }
        return skip;
    }
}
static jsmntok_t *vjson_oref(const char* jstr, jsmntok_t* json, va_list keys) {
    char* key = va_arg(keys,char*);
    if (key == NULL) {
        return json;
    }
    if (json[0].type != JSMN_OBJECT) {
        return NULL;
    }
    int size = json->size * 2;
    int i = 1;
    while (i <= size) {
        if (json[i].type == JSMN_OBJECT) {
            int local_skip = skip_object(&json[i]);
            i+= local_skip + 1;
            size += local_skip;
            continue;
        }
        if (jsoneq(jstr, &json[i], key) == 0) {
            return vjson_oref(jstr, &json[i+1], keys);
        }
        ++i;
    }
    return NULL;
}

static jsmntok_t *json_oref(const char* jstr, jsmntok_t* json, ...) {
    va_list args;
    va_start(args, json);
    jsmntok_t* result = vjson_oref(jstr, json, args);
    va_end(args);
    return result;
}

int main() {
    int r;
    jsmn_parser p;
    jsmntok_t json[200];
    char* json_string = "{\"foo\": {\"bar\": {\"x\":1,\"y\":2}, \"baz\": 5}, \"qux\": 2}";

    jsmn_init(&p);
    r = jsmn_parse(&p, json_string, strlen(json_string), json, sizeof(json)/sizeof(json[0]));
    if (r == JSMN_ERROR_NOMEM) {
        printf("jsmntok_t too small!");
    } else if (r == JSMN_ERROR_INVAL) {
        printf("Invalid json!");
    } else if (r == JSMN_ERROR_PART) {
        printf("Too little json data!");
    }

    if (r < 1 || json[0].type != JSMN_OBJECT) {
        printf("Object expected\n");
        return 1;
    }

    jsmntok_t* obj = json_oref(json_string,json,"foo",NULL);
    printf("Level 1:      json[foo]: ");
    json_print(json_string, obj);

    obj = json_oref(json_string,json,"foo","bar",NULL);
    printf("Level 2: json[foo][bar]: ");
    json_print(json_string, obj);

    obj = json_oref(json_string,json,"nonexistent",NULL);
    printf("None: json[nonexistent]: ");
    json_print(json_string, obj);

    return 0;
}

Which prints:

Level 1:      json[foo]: {"bar": {"x":1,"y":2}, "baz": 5}
Level 2: json[foo][bar]: {"x":1,"y":2}
None: json[nonexistent]: Null!

It's possible skip_object and json_oref could be combined into a single function, but this is how I sort of logically broke it down in my head. I'll leave this open in case someone has input on this or a better solution, but feel free to close.

EDIT: Handle arrays properly, taking into account that they can themselves contain more objects/arrays.

dieggsy avatar Aug 10 '19 18:08 dieggsy

Honestly, myself I see room for improvement in the parser itself to make this task easier. Possibly add another field to token data with position of a sibling?

pt300 avatar Aug 20 '19 18:08 pt300

I come up with some nice C++ usage pattern, maybe someone will be interested:

  • I parse stuff almost normally (don't forget to handle errors):

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L418-L422

  • I reserve one token for "guard", that have end position as high as possible:

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L433

  • Then I use function per object type (including root level):

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L156-L168

  • The signature allows for both input and output data from parsing process. In my code I use input for modifications to current configuration, and output to response with the configuration. In my case I use only one part at the time to reuse buffer (either input or output), but it could work at the same time.

  • Inside the functions, I go over the tokens, pairs of key and value each time:

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L169-L187

    Note that std::hash could be used as well for key comparisons. You could also do full string comparisons or some tree instead of the hashing and switch.

  • To skip over object (which is handled by other function, or omitted) I compare ends of the token. Here the guard from the beginning comes useful, as it assures we can't get out of bounds.

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L207-L217

  • To skip over primitives, move by 2 and check the ends:

    https://github.com/AgainPsychoX/YellowToyCar/blob/e7e3fecadf44a8ed51d273db1d9512945c502f77/src/http.cpp#L236-L239

  • You could also come up with code to skip over arrays easily too.

AgainPsychoX avatar Jan 20 '23 22:01 AgainPsychoX

Recursive functions to work around tok.size being non-hierarchical overcomplicates the issue. You can just:

  • Take a token of any type, including a nested object or array. Store its tok.end, say, as obj.end
  • Iterate on tokens until tok.start > obj.end

Now your iterator is past the object and all of its nested tokens, pointing at the first non-nested token.

jsmn tokens are ordered in source document order. Assume obj is the token of the nested object/array:

  1. tokens that appear in the document before the nested object/array will have a tok.end < obj.start.
  2. tokens that appear in the document after the nested object/array will have a tok.start > obj.end.
  3. tokens that are nested inside the nested object/array will have tok.start > obj.start and tok.end < obj.end.

Once you understand your data, a flat iterator over tokens works here.

// `*i` will be the index of the next non-nested token, or the last token.
// works for any type of token
static void
jsmn_nested_skip(const jsmntok_t* tok, int num_tokens, int* i)
{
    for (int char_end = tok[*i].end; *i < num_tokens && tok[*i].start < char_end; (*i)++)
        ; // pass
}

mlabbe avatar Jan 23 '24 20:01 mlabbe