json icon indicating copy to clipboard operation
json copied to clipboard

msgpack parser failed to parse null as Map key

Open kotori2 opened this issue 3 years ago • 9 comments

What is the issue you have?

I'm trying to parse msgpack data like:

{
    null: 1
}

which serializes as 81 C0 01. But nlohmann::json::from_msgpack failed to parse it with

[json.exception.parse_error.113] parse error at byte 2: syntax error while parsing MessagePack string: expected length specification (0xA0-0xBF, 0xD9-0xDB); last byte: 0xC0

Please describe the steps to reproduce the issue.

Can you provide a small but working code example?

#include <nlohmann/json.hpp>
int main() {
    const uint8_t src[] = {0x81, 0xC0, 0x01};
    nlohmann::json j = nlohmann::json::from_msgpack(src);
}

What is the expected behavior?

From the msgpack specs it doesn't restricts which value could be used on Map keys

And what is the actual behavior instead?

The specs above says

applications may remove Binary type, restrict keys of map objects to be String type, and put some restrictions to make the semantics compatible with JSON

So maybe it could be serialized to

{
    "null": 1
}

like other implementation does

Which compiler and operating system are you using?

  • Compiler: MSVC 19.30.30705
  • Operating system: Windows 11 22000.527 Also reproduced on gcc11

Which version of the library did you use?

  • [ ] latest release version 3.10.5
  • [x] other release - please state the version: 3.10.4
  • [ ] the develop branch

If you experience a compilation error: can you compile and run the unit tests?

  • [ ] yes
  • [ ] no - please copy/paste the error message below
  • [x] N/A

kotori2 avatar Mar 08 '22 10:03 kotori2

This is indeed not supported by the library as JSON requires object keys to be strings. I am hesitant to implement a workaround like treating null as a string.

In any case, we should overwork the documentation to be explicit about the restrictions.

nlohmann avatar Mar 08 '22 10:03 nlohmann

In Python APIs there is a param bool strict_map_key=True, which does the thing.

kotori2 avatar Mar 08 '22 10:03 kotori2

This is indeed not supported by the library as JSON requires object keys to be strings. I am hesitant to implement a workaround like treating null as a string.

This would only need to apply to the parser, though? The user can be responsible for translating lookups to j["null"] where necessary.

falbrechtskirchinger avatar Mar 09 '22 13:03 falbrechtskirchinger

Can it be at least enabled on the sax parser? (all integer, unsigned integer, float ecc )

The virtual function (example json_sax::key(json_sax::number_integer_t)) can be defaulted to fail parsing while leaving the possibility to the user implementation of the parser to override it

Is this possible?

Thanks, Luca

imwhocodes avatar Apr 12 '22 17:04 imwhocodes

No. The parser expects a string for keys. This happens before the SAX interface is involved.

    bool get_msgpack_object(const std::size_t len)
    {
        if (JSON_HEDLEY_UNLIKELY(!sax->start_object(len)))
        {
            return false;
        }

        string_t key;
        for (std::size_t i = 0; i < len; ++i)
        {
            get();
            if (JSON_HEDLEY_UNLIKELY(!get_msgpack_string(key) || !sax->key(key))) // <-- string key is parsed/required here
            {
                return false;
            }

falbrechtskirchinger avatar Apr 12 '22 17:04 falbrechtskirchinger

Sorry but the parse except a string because it is the only thing that know how to parse, can something like this be implemented?

bool get_msgpack_object(const std::size_t len)
{
    if (JSON_HEDLEY_UNLIKELY(!sax->start_object(len)))
    {
        return false;
    }

    for (std::size_t i = 0; i < len; ++i)
    {
        switch (get())
        {
            // EOF
            case std::char_traits<char_type>::eof():
                return false; // Or whatever

            // positive fixint
            case 0x00:
            // Various bytes omitted...
            case 0x7F:
                if( ! sax->key_number_integer(static_cast<number_unsigned_t>(current) ) ) return false;
                break;

            // fixstr
            case 0xA0:
            // Various bytes omitted...
            case 0xDB: // str 32
            {
                // NORMAL STRING CASE
                string_t s;
                if( ! ( get_msgpack_string(s) && sax->key(s) ) ) return false;
                break;
            }

            case 0xC0: // nil
                if( ! sax->key_null() ) return false;
                 break;

            case 0xC2: // false
                if( ! sax->key_bool(false) ) return false;
                 break;

            case 0xC3: // true
                if( ! sax->key_bool(true) ) return false;
                 break;

            // And so on with other meaningful types

        }

        if(JSON_HEDLEY_UNLIKELY(!parse_msgpack_internal())){
            return false;
        }
    }

imwhocodes avatar Apr 12 '22 17:04 imwhocodes

What you did is a start but more special-casing in other places is required to keep things from breaking.

For starters, the main parser is implemented on top of the SAX interface and can't support this extended key API because, as you know, the object container's key type is always basic_json::string_t. There's probably more.

In any case, this library is a conforming implementation of the msgpack specification according to OP's quotes. So, is it worth the effort? I don't know.

@nlohmann will have to weigh in if a PR to add arbitrary key types to the SAX interface would be accepted. I'm just another user and have no strong feelings either way.

falbrechtskirchinger avatar Apr 12 '22 18:04 falbrechtskirchinger

Yeah I know the the default object container's key type is always basic_json::string_t and I don't want to change that, the default json_sax parser on its supposed virtual function json_sax::key_number_integer(json_sax::number_integer_t) should simply return false, this will keep same behaviour as present (fail to parse) but also let the user extend funtionality

With an extension like this @kotori2 could simply extend the sax parser with:

bool json_sax::key_null() override {
    return this->key("null");
}

Giving him his excepted result:

{
    "null": 1
}

given the input:

{
    null: 1
}

while at the same time keeping the current interface and behaviour

imwhocodes avatar Apr 12 '22 18:04 imwhocodes

Sounds good. Are you interested in pursing this in form of a PR?

falbrechtskirchinger avatar Apr 12 '22 19:04 falbrechtskirchinger