vroom Use Boost Json

Issue

#1107

Tasks

[ ] test performance
[ ] review

Jun 17 '24 14:06 SebMilardo

I'm really curious to evaluate this change for our use-case. In my experience the main bottleneck for us when it comes to json is parsing huge matrices, either from a file (in case of a custom matrix), or from the network (e.g. osrm-routed output). In this situation parsing matrices will dominate the problem loading time reported in summary.computing_times.loading.

@SebMilardo did you run any tests so far?

Jun 19 '24 09:06 jcoupey

Not yet, but I'm planning to run some tests this weekend

Jun 19 '24 09:06 SebMilardo

Great! You probably want to test various problem sizes, including instances with several thousands of point to really notice differences. If you're only interested in the loading time and the solving time becomes a pain, you may make things faster by 1. only using TSP instances (the dedicated code scales better) 2. using -l 0 to stop the search prior to any local search.

Jun 19 '24 10:06 jcoupey

Bad news, I've started testing the parse function in input_parser and rapidjson is always faster than boost::json. I'm using an hand crafted input file with 100000 vehicles and 100000 jobs (a ~20MB .json file) and basically boost::json is faster at parsing the string but slower at accessing the parsed objects which resulted in the parse function being ~25% slower on average. I'm playing around with options, allocators, error checks, etc to make boost::json faster but I also found this FAQ (https://www.boost.org/doc/libs/1_85_0/libs/json/doc/html/json/frequently_asked_questions.html) and this library (https://github.com/simdjson/simdjson). Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects. As Vroom uses the Json objects to build its own objects it might be worth a try.

Jun 24 '24 10:06 SebMilardo

Thanks for testing and reporting. So we'd basically trade a "better maintained" project with a somewhat simple user code against a ~25% slowdown on parsing. This may indeed be too high a price, especially since rapidjson (despite it's dev state) has been "just working" the whole time. Happy to get other views on that.

Jun 25 '24 08:06 jcoupey

Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects.

Might be a good option. As you point out: we never modify parsed objects but just read parts of them to populate our own C++ objects. Also the dev around simdjson seems to be quite active. My concern here would be the time spent, as you already invested quite some in order to adjust the whole codebase for boost::json. Do you think it would be easier/faster/possible to start with a quick benchmark outside VROOM?

Jun 25 '24 08:06 jcoupey

No problem! I think I can integrate simdjson in the parse function and compare the results. Now that I have a general understanding of where .json data is used in the codebase it is just a matter of learning how this new library works.

Jun 26 '24 04:06 SebMilardo