folly [json] Re-enable compilation of JsonTest and add two perf tests

It seems that compiling of the main Json parser unit test was disabled due to a 'MSVC Preprocessor' bug.

It is safe to assume that g++/clang++ will be used on environments like Linux/etc while MSVC would be used on MS-Windows.

This change will re-enable the unit test compilation while also adding two new unit tests targeting stressing the performance of the Json parser.

Jul 06 '22 00:07 Adenilson

@aaronabramov, @yfeldblum and @ispeters any feedback?

The idea is to re-enable compilation of the main JSON unit test and add two new unit tests that could be used for a perf profile targeting identifying hot spots to guide optimization work.

I'm also re-posting some questions and remarks I've added on other merge request (#1805 ), quote: "I started by having a look on the JSON parsing class and noticed that the main unit test was commented out (apparently related to a MSVC bug?).

To get it building again was a one liner fix, I proceeded next to use as an input file a JSON file generated by retrieving the comments from my own Facebook account (I named the file 'input.json').

I wonder if there is a corpus of sample JSON files that are relevant for Folly client code?.

The reason I ask about a corpus is that I observed diverging ordering of hot spots depending on the input JSON file (e.g. twitter.json vs FB comments sample file mentioned above).

A perf profile showed that one of the top 8 hot spots (alongside parseValue, parseString, F14Table, dynamic::operator=) while parsing JSON was relate to hashing.

That lead me to have a look on which hashes and where they were deployed in the code base.

There may be good potential for performance gains in JSON parsing by leveraging SIMD optimizations.

I've done some previous work on vectorizing zlib for Chromium and it is quite surprising the performance gains achievable by working with vectors instead of a scalar approach.

The good news is that there is previous work done on the area (e.g. simdjson).

Would Folly be open to adopt SIMD specific optimizations in its JSON parser or maybe just migrate to simdjson (if it has all the required features)?"

Jul 07 '22 18:07 Adenilson

Enabling the json test for open source builds is good, but for the perf tests, we do have a benchmark framework (https://github.com/facebook/folly/blob/main/folly/Benchmark.h - With https://github.com/facebook/folly/blob/main/folly/test/AsciiCaseInsensitiveBenchmark.cpp as an example). The issue is that I don't think we have these setup to be buildable from cmake currently.

Jul 08 '22 21:07 Orvid

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jul 08 '22 21:07 facebook-github-bot

It's taken a bit for it land, but, as of https://github.com/facebook/folly/commit/c8a3dbc6a2ef0ac28fcf9b5552db6a6e82223044 benchmarks can now be built from CMake. It seems like we've broken trunk since then though.

Could you adjust these benchmarks to use that framework instead of adding them as tests?

Jul 19 '22 22:07 Orvid

Sure can do!

Jul 19 '22 23:07 Adenilson

@Orvid first draft on turning the new tests into performance benchmarks.

Jul 20 '22 00:07 Adenilson

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jul 20 '22 22:07 facebook-github-bot

It is a bit odd that my comments are listed as 'pending' instead of simply being submitted. Perhaps a github quirk?

Just in case: a) Here are the warnings that were silenced with the above change: https://gist.github.com/Adenilson/7437bada44035181cca1b675ff309279

b) The JSON file can be found at: https://github.com/simdjson/simdjson/tree/master/jsonexamples

Jul 21 '22 01:07 Adenilson

Anything pending to have this landed?

Jul 22 '22 03:07 Adenilson

Friendly ping on the subject.

Can we proceed and land this patch as it should be low risk?

Jul 27 '22 20:07 Adenilson

Sorry for the delay on this. The current status is that enabling the json tests is fine, but for the benchmarks the default needs to have at least something checked in as the default files to use. If folks want to pull in a third-party json file to benchmark more completely that is fine, but we need something to be there by default.

Aug 11 '22 22:08 Orvid

@Orvid just added a sample file in the current patch and comments in the source code to where to find better/bigger JSON files.

Tested by building the new target (i.e. make json_benchmark) and next running the benchmark. Steps: a) mkdir out; cd out; b) cmake ../ -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON c) make json_benchmark json_test -j8 d) Followed next by executing the newly enabled/added test (https://gist.github.com/Adenilson/d36aa75f561c72e7377fa554a9e71be6).

I guess it should be good to go. :-)

Aug 12 '22 18:08 Adenilson

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Aug 17 '22 21:08 facebook-github-bot

I see that the build bots seem to fail.

A quick look on the linux bot showed the following compilation error: "/home/runner/work/folly/folly/folly/test/JsonTest.cpp:361:64: sorry, unimplemented: non-trivial designated initializers not supported"

I'm building with g++ 11.2.0 (standard in Ubuntu 22 LTS IIRC) with ToT (Top of the Tree) and no special configuration flags (i.e. only used what I posted in my previous message).

A quick look on the same bot seems to show that apparently gcc 7.5 is used (quote "The CXX compiler identification is GNU 7.5.0" in https://github.com/facebook/folly/runs/7888073890?check_suite_focus=true).

Any particular reason to use such older version of gcc/g++ by the build bots? It seems that gcc 7.2 was released in Aug 2017 (i.e. 5 years ago).

It might be the case that it could be workarounded using an extra compiler flag as needed (e.g. -std=c++14).

Since I don't have the privileges to add a patch to the compiler bots for a dry run, I would like to coordinate a fix to avoid multiple tries to address the issue at hand.

@Orvid : What are your thoughts?

Aug 18 '22 00:08 Adenilson

And gcc/g++ 7.5 apparently was released in November 2019 (https://gcc.gnu.org/releases.html), therefore almost 3 years old.

Aug 18 '22 00:08 Adenilson

The mac compilation error is at "ProducerConsumerQueueBenchmark.cpp:49:7: error: unknown type name 'cpu_set_t'" which is unrelated to the current change.

As a matter of fact, it seems that JsonTest.cpp builds fine with the compiler used by the mac bot (i.e. AppleClang 13.0.0.13000029).

Aug 18 '22 00:08 Adenilson

The Windows bot failure also seems unrelated with the current change.

The compilation error is "tests\Base64SpecialCasesTest.cpp(58): error C2131: expression did not evaluate to a constant".

Aug 18 '22 00:08 Adenilson

@Orvid just pushed an update with a patch on top trying to address the build issue on Linux, roughly based on the error message (i.e. 'unimplemented: non-trivial designated initializers not supported').

I'm trying to build gcc/g++ 7.5 locally from sources to reproduce the issue but that is kinda of a pain.

Aug 23 '22 02:08 Adenilson

@Orvid when you got some spare time, could you please add the current patchset to be run through the bots?

Aug 23 '22 17:08 adenilsoncavalcanti

@Orvid thanks for adding the new updated patchset for a dry-run.

That is some progress, it seems that now the patch will pass in the linux bot.

The other failing bots seem to have issues unrelated with the current changes: a) Mac: "test/ProducerConsumerQueueBenchmark.cpp:49:7: error: unknown type name 'cpu_set_t'' b) Windows: "tests\Base64SpecialCasesTest.cpp(58): error C2131: expression did not evaluate to a constant"

What is the procedure for this situation? I'm afraid is something I cannot fix (i.e. flakey bots on unrelated code).

Aug 24 '22 00:08 Adenilson

Ok, just done the requested change using CONTENT_DIR.

Aug 24 '22 21:08 Adenilson

Turns out that building benchmarks on OSX is broken, these patches will fix it: https://github.com/facebook/folly/pull/1847

Aug 27 '22 00:08 Adenilson

ping?

Sep 14 '22 15:09 Adenilson

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sep 15 '22 18:09 facebook-github-bot