zstd ZDICT_trainFromBuffer_cover is not thread safe

Describe the bug

ZDICT_trainFromBuffer_cover is not thread safe due to the use of global state:

https://github.com/facebook/zstd/blob/78955f5f9ddb3601566884ce217d91c132f5edc1/lib/dictBuilder/cover.c#L235-L236

Calling this function from multiple threads may result in segfault/access violation, use-after-free and other badness.

To Reproduce Call ZDICT_trainFromBuffer_cover from multiple threads

Desktop (please complete the following information):

OS: Windows
Version 11 23H2
Compiler: MSVC, Visual Studio 2022
Flags: default
Other relevant hardware specs: N/A
Build system: whatever zstd-sys/cargo uses

May 14 '24 05:05 as-com

From what I could understand from the code, the issue derives from having to use a global context to pass extra information (i.e. a context) while performing the call to qsort().

Perhaps the reentrant qsort_r() could be a good fit for this problem?

From the documentation (https://man7.org/linux/man-pages/man3/qsort_r.3.html), quote: "The qsort_r() function is identical to qsort() except that the comparison function compar takes a third argument. A pointer is passed to the comparison function via arg. In this way, the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads.".

It is available with small variations (i.e. qsort_s()) for both Windows (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170) and BSD (https://man.freebsd.org/cgi/man.cgi?query=qsort_r&sektion=3), therefore should be portable.

I would like to help on this one if it is available. :-)

Jun 12 '24 23:06 Adenilson

An update: I have a first draft patch, it seems to work fine on OSX@clang and Linux@gcc.

I will need a bit of time to get a Windows machine (and apparently a BSD machine too...), so I hope this is not a high priority bug.

I got a few questions, please see below:

@as-com: as you mentioned zstd-sys/cargo, I would assume that you are using zstd from Rust? Do you think it is possible to share a simple test case exposing the bug?

@Cyan4973: concerning testing... what would be the recommended test suite (and how to build/run/etc) for changes in the cover.c file?

So far, I only did my local changes and run 'make test' in the root folder for both OSX and Linux and after a while could only see PASS/Success/etc but I guess there is a more focused test suite for the code in the dictionary folder.

Jun 13 '24 22:06 adenilsoncavalcanti

I guess there is a more focused test suite for the code in the dictionary folder.

There isn't actually.

That's the reason why there is no report of any issue using current CI test suite.

So one test must be added to observe the problem, and then see it fixed.

While it's not very hard, it's not completely trivial either : one would need a Multi-threaded application, that is calling the Training function from each one of its Thread. 2 threads are likely enough to witness the problem. tsan could also be added, but its major downside, on top of being slow, is its lack of availability beyond Linux, so it's preferable to not depend on it.

Jun 13 '24 23:06 Cyan4973

Sounds good. I will hack something to test the fix.

Jun 14 '24 23:06 Adenilson

An update: I wrote a simple dictionary builder example calling ZDICT_trainFromBuffer_cover with 160 samples and each sample with 100KB in size.

For a regular build (i.e. make -j8), it crashes reliably when I use 32 threads, not so reliably with 24 threads and rarely with 16 threads.

The backtrace looks like:

Jun 17 '24 23:06 Adenilson

Now when I apply the qsort_r() fix, it seems to run fine but it takes a little while to run (i.e. around 20s in a Intel 11th gen i7 processor).

A link to the output of the test case: https://gist.github.com/Adenilson/4668b853b89a4eed4ff551c3885a814a

Jun 17 '24 23:06 Adenilson

The current fix works fine on OSX & Linux. I got validate it for Windows and since I don't have a Win box, it will take a little bit to get one and setup compilers/devel tools/etc.

Jun 17 '24 23:06 Adenilson

Thanks @Adenilson ! This is trending well !

Jun 18 '24 01:06 Cyan4973

@Cyan4973 the above branches have a rough draft of how the fix and the PoB (Proof of Bug) look like.

They are not ready for landing yet, given that I need to validate the approach on BSD (I assume that it is an officially supported OS? Or not? Is it part of the CI?) and it is unclear to me ATM how the PoB should be integrated (or not) e.g. as a unit test? Do test runners time out or have resource restraints (i.e. I'm using 48 threads in the PoB)?

Also I noticed that the MS Visual Studio files are for VS 2010. I got VS 2022 installed in an old Windows laptop and was forced to 're-target' the build files to VS 2022 to be able to build zstd.

Is that expected? Or is zstd build in another way for MS Windows (e.g. cmake)?

Finally, it seems that OSX Build on vanilla 'dev' branch (hash 3242ac598e6f17d8008f6110337a3b4c1205842b from Friday Jun 14th) is now broken, I guess I got fix that too: CC obj/generic_noconf/zstdcli_trace.o compiling single-threaded static library 1.5.6 ==> building with threading support ==> building zstd with .gz compression support ==> no liblzma, building zstd without .xz/.lzma support ==> no liblz4, building zstd without .lz4 support LINK obj/generic_noconf/zstd zstd build completed compiling multi-threaded dynamic library 1.5.6 ld: unknown options: -soname=libzstd.so.1 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [obj/generic_noconf/dynamic/libzstd.so.1.5.6] Error 1 make: *** [lib-release] Error 2 make -j8 37.82s user 3.03s system 621% cpu 6.569 total

Jun 19 '24 00:06 Adenilson

I assume that it is an officially supported OS? Or not? Is it part of the CI?

Yes it is. Here is the last test: https://github.com/facebook/zstd/runs/26246973990

Do test runners time out or have resource restraints (i.e. I'm using 48 threads in the PoB)?

Yes, there's a hard limit at 1 hour, but in this project, we aim at a ~20mn max for any particular test, and ~5mn preferably. Multithreading is available, though it's unlikely to ramp up to 48 hardware threads. More problematic can be the RAM usage. I don't know the exact limit, but we noticed in the past that, in order to stay in the "safe zone", RAM usage should remain < 1 GB, which can be a severe constraint.

Is that expected? Or is zstd build in another way for MS Windows (e.g. cmake)?

Yes. Basically, VS2010 project file can be updated to any modern version, so we have used that since the beginning of the project (~2015). We could bump that to VS2013 if that matters, since we have stopped supporting VS2010 and VS2012.

That being said, an even better future would be to no longer manually maintain any Solution file, and instead get them generated automatically, typically from cmake as part of make all or make vs_solution. We haven't (yet) invested the time to get it done.

Finally, it seems that OSX Build on vanilla 'dev' branch

That's a concern. And it's surprising since macos is part of CI, so such a build error should have been detected. edit : indeed, here is the last test, and it seems successful: https://github.com/facebook/zstd/actions/runs/9352935877/job/25742255656

Such a bug should be fixed a.s.a.p, though not necessarily by you, nor bundled into another PR with a different objective.

edit : the pb seems bisected to #4067 , but here too, the macos test was run and was successful... https://github.com/facebook/zstd/actions/runs/9388161997/job/25852791513

edit 2: #4076

Jun 19 '24 00:06 Cyan4973

Oh, btw, are you familiar with tsan, thread sanitizer ? Presuming you know how to use it (likely limited to Linux only), would it help make the test fail more reliably with less threads (2 being ideal) ?

Jun 19 '24 02:06 Cyan4973

@Cyan4973 I just had dinner and was going to open a detailed bug report for the OSX build breakage and just noticed that you already fixed the build. Thanks a lot! :-)

I will experiment with tsan and report back the results (also got get a *BSD environment to validate the qsort_r()/s() fix).

Jun 19 '24 05:06 Adenilson

An update: I got the fix to build for FreeBSD. Since I don't have a spare system to install FreeBSD, I just used a VM under VirtualBox and it worked reasonably well.

Fortunately it seems that *BSD systems comply with the GNU API for qsort_r(), which simplifies the things (i.e. the same code is used for Linux and *BSD).

I consider now the fix properly ready for review, since it was tested on 4 OSes (Linux, OSX, Windows, FreeBSD) and I will submit a merge request.

What is missing is the PoB (Proof of Bug) integration. Is the later a hard requirement for the first?

Jun 19 '24 23:06 Adenilson

One last thing that occurred to me is to verify if it builds fine for Android (IIRC the NDK uses clang).

Jun 19 '24 23:06 Adenilson

The test matrix was: clang@OSX, gcc@Linux, gcc@FreeBSD, msvc@Windows.

Jun 20 '24 00:06 Adenilson

What is missing is the PoB (Proof of Bug) integration. Is the later a hard requirement for the first?

It's preferable, but if it's too hard to achieve within the constraints of CI, this could be left.

A desirable intermediate solution could be to provide the test program or unit without making it a CI test, shipping with it a documentation to reproduce the test locally.

Jun 20 '24 00:06 Cyan4973

Perhaps the PoB could be rewritten in a way to teach how to use the dictionary building API, thus addressing the other request on #4066.

I still got investigate TSAN, it may be the case that it is able to simply the PoB quite a bit.

I just noticed that the CI showed that a bot (MINGW64) failed to compile.

That is a combination I haven't tested (gcc@Windows), but I think probably the fix should be pretty simple (i.e. test if the compiler used is GCC and then define _GNU_SOURCE).

So not ready for landing yet, still needs some more work on it. :-)

Jun 20 '24 03:06 Adenilson

Fixed in #4086

Sep 26 '24 00:09 Cyan4973

zstd zstd copied to clipboard

ZDICT_trainFromBuffer_cover is not thread safe

zstd
zstd copied to clipboard