cereal icon indicating copy to clipboard operation
cereal copied to clipboard

Update to recent RapidJSON and integrate streaming (was JSON-parser errors are very unhelpfull)

Open Fiona-J-W opened this issue 11 years ago • 15 comments

The error-messages in the exceptions thrown by the JSON-parser aren't very helpfull:

Error: rapidjson internal assertion failure: IsObject()

It would be great if these could become more descriptive, maybe something like this:

Error: in object 'value0': in array 'value3': expected object, but found integer.

Fiona-J-W avatar Mar 24 '14 21:03 Fiona-J-W

Example:

Code:

#include <fstream>
#include <iostream>
#include <string>
#include <vector>


#include <cereal/cereal.hpp>
#include <cereal/types/string.hpp>
#include <cereal/types/vector.hpp>
#include <cereal/archives/json.hpp>

int main(int argc, char** argv) try {
    if(argc != 2) {
        return 1;
    }
    std::vector<std::vector<std::string>> vec;

    std::ifstream file{argv[1]};
    cereal::JSONInputArchive archive{file};
    archive(vec);

} catch(std::exception& e) {
    std::cerr << e.what() << '\n';
    return 2;
}

JSON:

{
    "value0": [
        [
            ["foo"]
        ],
        [
            "bar",
            "baz"
        ]
    ]
}

Error:

rapidjson internal assertion failure: IsString()

Fiona-J-W avatar Mar 24 '14 21:03 Fiona-J-W

It would be great if we could figure out how to add line numbers to errors as well. Ideally, the above error would read something like:

Cereal JSON serialization error. Expected type [string] on line 4, but found [list] instead.

randvoorhies avatar Mar 24 '14 22:03 randvoorhies

I think this would require modification of rapidjson as I can't find anything related to storing or retrieving that information in the source. Honestly moving to a different library wouldn't be such a bad idea in my opinion, rapidjson is nice but it does make some questionable decisions. For example using setjmp for performance reasons in a parser is dubious at best. Every call to setjmp makes a system call to store your signal mask along must spend time copying a bunch of registers you probably aren't using (see rethinkdb's informative blog post on the matter). Not to mention mixing setjmp with C++ can lead to stability problems.

It might be worth the effort to extract the JSON reader/writer routines from chromium's source. It doesn't look like it'd be too hard as it's not really that tied down to the rest of chromium (you'd just need to remove google test and replace the type variant it uses with a standalone one).

It already includes this information, is wicked fast and is battle tested. As a bonus the license is BSD-compatible which plays nice with cereal :).

Facebook's folly library also has a JSON reader/writer which supports these features with an equally lovely license. You would need to extract the dynamic, range and range classes and remove FbString but other than that it's not too tied down to anything else in the library if you sed s/boost::/std::/g :P

tarqd avatar Mar 27 '14 23:03 tarqd

According to the discussion on reddit rapidjson is no longer maintained, so a change might be beneficial in more than one way.

Fiona-J-W avatar Mar 27 '14 23:03 Fiona-J-W

It's worth noting that folly is probably the better choice than chromium's as it's easier to make standalone and is more in line with cereal's design philosophy of using modern C++ and has an API similar to rapidjson for retrieving/casting values. Google also doesn't allow many C++11 features in it's code base, most notably move semantics and prefers functions like GetString() vs get<string>()

tarqd avatar Mar 27 '14 23:03 tarqd

I like the idea of moving to one of these simpler looking parsers instead of using RapidJSON.

AzothAmmo avatar Mar 28 '14 00:03 AzothAmmo

I started toying around with extracting FB folly, the main problem is it uses double-conversion which, while providing extremely fast conversions for doubles from and to strings etc, adds a library dependency. If cereal wants to remain a header only library we'd need to either move all that code into headers or use something else to convert doubles

That and I can't get gtest to compile under clang on OSX :(. It uses std::tr1:: all over the place

tarqd avatar Mar 28 '14 00:03 tarqd

Another option: https://github.com/esnme/ujson4c/

tarqd avatar May 12 '14 15:05 tarqd

Just as a side note, RapidJSON has never been maintained. The developer created it and that was it. I asked him once (per eMail) when a new version will come out and he asked what was missing.... so maybe you could contact him too and ask for specifics?

patlecat avatar May 29 '14 23:05 patlecat

Please note that RapidJSON has moved to GitHub and there are active efforts solving previous issues. Feel free to jot down issues or advices there. I felt sorry that I failed to maintain rapidjson previously due to personal situations.

miloyip avatar Jun 30 '14 06:06 miloyip

Hi @miloyip , great move and thanks for reviving the lib again. I really like it and want to see it evolve and support modern C++ standards and compilers. :+1:

patlecat avatar Jul 01 '14 22:07 patlecat

Considering the fact that RapidJSON now supports streaming, we'll likely stick with it and work on improving error messages and usability after we upgrade.

AzothAmmo avatar Aug 13 '14 17:08 AzothAmmo

By the way, we have improved the error reporting mechanism. Now parse error is retrieved as an enum ParseErrorCode. So it can be interpreted by the program. And the code can be converted to an English message (or other locale provided by developer) optionally.

http://miloyip.github.io/rapidjson/md_doc_dom.html#ParseError

miloyip avatar Aug 13 '14 17:08 miloyip

randvoorhies commented on Mar 24, 2014

It would be great if we could figure out how to add line numbers to errors as well. Ideally, the above error would read something like:

Cereal JSON serialization error. Expected type [string] on line 4, but found [list] instead.

Jip, line numbers would be really good. Is there a way of doing this currently? (Perhaps getting some kind of marker of stream position or something? Then I could count newlines myself if I have a stream that allows me to go back to the beginning.)

Thanks.

Side note: Regarding JSON and C++11: * JSON Voorhees * JSON for Modern C++

(from https://isocpp.org/search/google?q=json )

ajneu avatar Jul 18 '15 06:07 ajneu

We're now using RapidJSON 1.0.2, I haven't done anything other than make it work as is, so no streaming or anything like that yet.

AzothAmmo avatar Apr 19 '16 17:04 AzothAmmo