Use Boost Json
Issue
#1107
Tasks
- [ ] test performance
- [ ] review
I'm really curious to evaluate this change for our use-case. In my experience the main bottleneck for us when it comes to json is parsing huge matrices, either from a file (in case of a custom matrix), or from the network (e.g. osrm-routed output). In this situation parsing matrices will dominate the problem loading time reported in summary.computing_times.loading.
@SebMilardo did you run any tests so far?
Not yet, but I'm planning to run some tests this weekend
Great! You probably want to test various problem sizes, including instances with several thousands of point to really notice differences. If you're only interested in the loading time and the solving time becomes a pain, you may make things faster by 1. only using TSP instances (the dedicated code scales better) 2. using -l 0 to stop the search prior to any local search.
Bad news, I've started testing the parse function in input_parser and rapidjson is always faster than boost::json. I'm using an hand crafted input file with 100000 vehicles and 100000 jobs (a ~20MB .json file) and basically boost::json is faster at parsing the string but slower at accessing the parsed objects which resulted in the parse function being ~25% slower on average. I'm playing around with options, allocators, error checks, etc to make boost::json faster but I also found this FAQ (https://www.boost.org/doc/libs/1_85_0/libs/json/doc/html/json/frequently_asked_questions.html) and this library (https://github.com/simdjson/simdjson). Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects. As Vroom uses the Json objects to build its own objects it might be worth a try.
Thanks for testing and reporting. So we'd basically trade a "better maintained" project with a somewhat simple user code against a ~25% slowdown on parsing. This may indeed be too high a price, especially since rapidjson (despite it's dev state) has been "just working" the whole time. Happy to get other views on that.
Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects.
Might be a good option. As you point out: we never modify parsed objects but just read parts of them to populate our own C++ objects. Also the dev around simdjson seems to be quite active. My concern here would be the time spent, as you already invested quite some in order to adjust the whole codebase for boost::json. Do you think it would be easier/faster/possible to start with a quick benchmark outside VROOM?
No problem! I think I can integrate simdjson in the parse function and compare the results. Now that I have a general understanding of where .json data is used in the codebase it is just a matter of learning how this new library works.