[json] Re-enable compilation of JsonTest and add two perf tests
It seems that compiling of the main Json parser unit test was disabled due to a 'MSVC Preprocessor' bug.
It is safe to assume that g++/clang++ will be used on environments like Linux/etc while MSVC would be used on MS-Windows.
This change will re-enable the unit test compilation while also adding two new unit tests targeting stressing the performance of the Json parser.
@aaronabramov, @yfeldblum and @ispeters any feedback?
The idea is to re-enable compilation of the main JSON unit test and add two new unit tests that could be used for a perf profile targeting identifying hot spots to guide optimization work.
I'm also re-posting some questions and remarks I've added on other merge request (#1805 ), quote: "I started by having a look on the JSON parsing class and noticed that the main unit test was commented out (apparently related to a MSVC bug?).
To get it building again was a one liner fix, I proceeded next to use as an input file a JSON file generated by retrieving the comments from my own Facebook account (I named the file 'input.json').
I wonder if there is a corpus of sample JSON files that are relevant for Folly client code?.
The reason I ask about a corpus is that I observed diverging ordering of hot spots depending on the input JSON file (e.g. twitter.json vs FB comments sample file mentioned above).
A perf profile showed that one of the top 8 hot spots (alongside parseValue, parseString, F14Table, dynamic::operator=) while parsing JSON was relate to hashing.
That lead me to have a look on which hashes and where they were deployed in the code base.
There may be good potential for performance gains in JSON parsing by leveraging SIMD optimizations.
I've done some previous work on vectorizing zlib for Chromium and it is quite surprising the performance gains achievable by working with vectors instead of a scalar approach.
The good news is that there is previous work done on the area (e.g. simdjson).
Would Folly be open to adopt SIMD specific optimizations in its JSON parser or maybe just migrate to simdjson (if it has all the required features)?"
Enabling the json test for open source builds is good, but for the perf tests, we do have a benchmark framework (https://github.com/facebook/folly/blob/main/folly/Benchmark.h - With https://github.com/facebook/folly/blob/main/folly/test/AsciiCaseInsensitiveBenchmark.cpp as an example). The issue is that I don't think we have these setup to be buildable from cmake currently.
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
It's taken a bit for it land, but, as of https://github.com/facebook/folly/commit/c8a3dbc6a2ef0ac28fcf9b5552db6a6e82223044 benchmarks can now be built from CMake. It seems like we've broken trunk since then though.
Could you adjust these benchmarks to use that framework instead of adding them as tests?
Sure can do!
@Orvid first draft on turning the new tests into performance benchmarks.
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
It is a bit odd that my comments are listed as 'pending' instead of simply being submitted. Perhaps a github quirk?
Just in case: a) Here are the warnings that were silenced with the above change: https://gist.github.com/Adenilson/7437bada44035181cca1b675ff309279
b) The JSON file can be found at: https://github.com/simdjson/simdjson/tree/master/jsonexamples
Anything pending to have this landed?
Friendly ping on the subject.
Can we proceed and land this patch as it should be low risk?
Sorry for the delay on this. The current status is that enabling the json tests is fine, but for the benchmarks the default needs to have at least something checked in as the default files to use. If folks want to pull in a third-party json file to benchmark more completely that is fine, but we need something to be there by default.
@Orvid just added a sample file in the current patch and comments in the source code to where to find better/bigger JSON files.
Tested by building the new target (i.e. make json_benchmark) and next running the benchmark. Steps: a) mkdir out; cd out; b) cmake ../ -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON c) make json_benchmark json_test -j8 d) Followed next by executing the newly enabled/added test (https://gist.github.com/Adenilson/d36aa75f561c72e7377fa554a9e71be6).
I guess it should be good to go. :-)
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
I see that the build bots seem to fail.
A quick look on the linux bot showed the following compilation error: "/home/runner/work/folly/folly/folly/test/JsonTest.cpp:361:64: sorry, unimplemented: non-trivial designated initializers not supported"
I'm building with g++ 11.2.0 (standard in Ubuntu 22 LTS IIRC) with ToT (Top of the Tree) and no special configuration flags (i.e. only used what I posted in my previous message).
A quick look on the same bot seems to show that apparently gcc 7.5 is used (quote "The CXX compiler identification is GNU 7.5.0" in https://github.com/facebook/folly/runs/7888073890?check_suite_focus=true).
Any particular reason to use such older version of gcc/g++ by the build bots? It seems that gcc 7.2 was released in Aug 2017 (i.e. 5 years ago).
It might be the case that it could be workarounded using an extra compiler flag as needed (e.g. -std=c++14).
Since I don't have the privileges to add a patch to the compiler bots for a dry run, I would like to coordinate a fix to avoid multiple tries to address the issue at hand.
@Orvid : What are your thoughts?
And gcc/g++ 7.5 apparently was released in November 2019 (https://gcc.gnu.org/releases.html), therefore almost 3 years old.
The mac compilation error is at "ProducerConsumerQueueBenchmark.cpp:49:7: error: unknown type name 'cpu_set_t'" which is unrelated to the current change.
As a matter of fact, it seems that JsonTest.cpp builds fine with the compiler used by the mac bot (i.e. AppleClang 13.0.0.13000029).
The Windows bot failure also seems unrelated with the current change.
The compilation error is "tests\Base64SpecialCasesTest.cpp(58): error C2131: expression did not evaluate to a constant".
@Orvid just pushed an update with a patch on top trying to address the build issue on Linux, roughly based on the error message (i.e. 'unimplemented: non-trivial designated initializers not supported').
I'm trying to build gcc/g++ 7.5 locally from sources to reproduce the issue but that is kinda of a pain.
@Orvid when you got some spare time, could you please add the current patchset to be run through the bots?
@Orvid thanks for adding the new updated patchset for a dry-run.
That is some progress, it seems that now the patch will pass in the linux bot.
The other failing bots seem to have issues unrelated with the current changes: a) Mac: "test/ProducerConsumerQueueBenchmark.cpp:49:7: error: unknown type name 'cpu_set_t'' b) Windows: "tests\Base64SpecialCasesTest.cpp(58): error C2131: expression did not evaluate to a constant"
What is the procedure for this situation? I'm afraid is something I cannot fix (i.e. flakey bots on unrelated code).
Ok, just done the requested change using CONTENT_DIR.
Turns out that building benchmarks on OSX is broken, these patches will fix it: https://github.com/facebook/folly/pull/1847
ping?
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.