zstd
zstd copied to clipboard
ZDICT_trainFromBuffer_cover is not thread safe
Describe the bug
ZDICT_trainFromBuffer_cover is not thread safe due to the use of global state:
https://github.com/facebook/zstd/blob/78955f5f9ddb3601566884ce217d91c132f5edc1/lib/dictBuilder/cover.c#L235-L236
Calling this function from multiple threads may result in segfault/access violation, use-after-free and other badness.
To Reproduce Call ZDICT_trainFromBuffer_cover from multiple threads
Desktop (please complete the following information):
- OS: Windows
- Version 11 23H2
- Compiler: MSVC, Visual Studio 2022
- Flags: default
- Other relevant hardware specs: N/A
- Build system: whatever zstd-sys/cargo uses
From what I could understand from the code, the issue derives from having to use a global context to pass extra information (i.e. a context) while performing the call to qsort().
Perhaps the reentrant qsort_r() could be a good fit for this problem?
From the documentation (https://man7.org/linux/man-pages/man3/qsort_r.3.html), quote: "The qsort_r() function is identical to qsort() except that the comparison function compar takes a third argument. A pointer is passed to the comparison function via arg. In this way, the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads.".
It is available with small variations (i.e. qsort_s()) for both Windows (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170) and BSD (https://man.freebsd.org/cgi/man.cgi?query=qsort_r&sektion=3), therefore should be portable.
I would like to help on this one if it is available. :-)
An update: I have a first draft patch, it seems to work fine on OSX@clang and Linux@gcc.
I will need a bit of time to get a Windows machine (and apparently a BSD machine too...), so I hope this is not a high priority bug.
I got a few questions, please see below:
@as-com: as you mentioned zstd-sys/cargo, I would assume that you are using zstd from Rust? Do you think it is possible to share a simple test case exposing the bug?
@Cyan4973: concerning testing... what would be the recommended test suite (and how to build/run/etc) for changes in the cover.c file?
So far, I only did my local changes and run 'make test' in the root folder for both OSX and Linux and after a while could only see PASS/Success/etc but I guess there is a more focused test suite for the code in the dictionary folder.
I guess there is a more focused test suite for the code in the dictionary folder.
There isn't actually.
That's the reason why there is no report of any issue using current CI test suite.
So one test must be added to observe the problem, and then see it fixed.
While it's not very hard, it's not completely trivial either :
one would need a Multi-threaded application, that is calling the Training function from each one of its Thread. 2 threads are likely enough to witness the problem. tsan could also be added, but its major downside, on top of being slow, is its lack of availability beyond Linux, so it's preferable to not depend on it.
Sounds good. I will hack something to test the fix.
An update: I wrote a simple dictionary builder example calling ZDICT_trainFromBuffer_cover with 160 samples and each sample with 100KB in size.
For a regular build (i.e. make -j8), it crashes reliably when I use 32 threads, not so reliably with 24 threads and rarely with 16 threads.
The backtrace looks like:
Now when I apply the qsort_r() fix, it seems to run fine but it takes a little while to run (i.e. around 20s in a Intel 11th gen i7 processor).
A link to the output of the test case: https://gist.github.com/Adenilson/4668b853b89a4eed4ff551c3885a814a
The current fix works fine on OSX & Linux. I got validate it for Windows and since I don't have a Win box, it will take a little bit to get one and setup compilers/devel tools/etc.
Thanks @Adenilson ! This is trending well !
@Cyan4973 the above branches have a rough draft of how the fix and the PoB (Proof of Bug) look like.
They are not ready for landing yet, given that I need to validate the approach on BSD (I assume that it is an officially supported OS? Or not? Is it part of the CI?) and it is unclear to me ATM how the PoB should be integrated (or not) e.g. as a unit test? Do test runners time out or have resource restraints (i.e. I'm using 48 threads in the PoB)?
Also I noticed that the MS Visual Studio files are for VS 2010. I got VS 2022 installed in an old Windows laptop and was forced to 're-target' the build files to VS 2022 to be able to build zstd.
Is that expected? Or is zstd build in another way for MS Windows (e.g. cmake)?
Finally, it seems that OSX Build on vanilla 'dev' branch (hash 3242ac598e6f17d8008f6110337a3b4c1205842b from Friday Jun 14th) is now broken, I guess I got fix that too: CC obj/generic_noconf/zstdcli_trace.o compiling single-threaded static library 1.5.6 ==> building with threading support ==> building zstd with .gz compression support ==> no liblzma, building zstd without .xz/.lzma support ==> no liblz4, building zstd without .lz4 support LINK obj/generic_noconf/zstd zstd build completed compiling multi-threaded dynamic library 1.5.6 ld: unknown options: -soname=libzstd.so.1 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [obj/generic_noconf/dynamic/libzstd.so.1.5.6] Error 1 make: *** [lib-release] Error 2 make -j8 37.82s user 3.03s system 621% cpu 6.569 total
I assume that it is an officially supported OS? Or not? Is it part of the CI?
Yes it is. Here is the last test: https://github.com/facebook/zstd/runs/26246973990
Do test runners time out or have resource restraints (i.e. I'm using 48 threads in the PoB)?
Yes, there's a hard limit at 1 hour, but in this project, we aim at a ~20mn max for any particular test, and ~5mn preferably. Multithreading is available, though it's unlikely to ramp up to 48 hardware threads. More problematic can be the RAM usage. I don't know the exact limit, but we noticed in the past that, in order to stay in the "safe zone", RAM usage should remain < 1 GB, which can be a severe constraint.
Is that expected? Or is zstd build in another way for MS Windows (e.g. cmake)?
Yes. Basically, VS2010 project file can be updated to any modern version, so we have used that since the beginning of the project (~2015). We could bump that to VS2013 if that matters, since we have stopped supporting VS2010 and VS2012.
That being said, an even better future would be to no longer manually maintain any Solution file, and instead get them generated automatically, typically from cmake as part of make all or make vs_solution. We haven't (yet) invested the time to get it done.
Finally, it seems that OSX Build on vanilla 'dev' branch
That's a concern. And it's surprising since macos is part of CI, so such a build error should have been detected.
edit : indeed, here is the last test, and it seems successful: https://github.com/facebook/zstd/actions/runs/9352935877/job/25742255656
Such a bug should be fixed a.s.a.p, though not necessarily by you, nor bundled into another PR with a different objective.
edit : the pb seems bisected to #4067 , but here too, the macos test was run and was successful...
https://github.com/facebook/zstd/actions/runs/9388161997/job/25852791513
edit 2: #4076
Oh, btw,
are you familiar with tsan, thread sanitizer ?
Presuming you know how to use it (likely limited to Linux only), would it help make the test fail more reliably with less threads (2 being ideal) ?
@Cyan4973 I just had dinner and was going to open a detailed bug report for the OSX build breakage and just noticed that you already fixed the build. Thanks a lot! :-)
I will experiment with tsan and report back the results (also got get a *BSD environment to validate the qsort_r()/s() fix).
An update: I got the fix to build for FreeBSD. Since I don't have a spare system to install FreeBSD, I just used a VM under VirtualBox and it worked reasonably well.
Fortunately it seems that *BSD systems comply with the GNU API for qsort_r(), which simplifies the things (i.e. the same code is used for Linux and *BSD).
I consider now the fix properly ready for review, since it was tested on 4 OSes (Linux, OSX, Windows, FreeBSD) and I will submit a merge request.
What is missing is the PoB (Proof of Bug) integration. Is the later a hard requirement for the first?
One last thing that occurred to me is to verify if it builds fine for Android (IIRC the NDK uses clang).
The test matrix was: clang@OSX, gcc@Linux, gcc@FreeBSD, msvc@Windows.
What is missing is the PoB (Proof of Bug) integration. Is the later a hard requirement for the first?
It's preferable, but if it's too hard to achieve within the constraints of CI, this could be left.
A desirable intermediate solution could be to provide the test program or unit without making it a CI test, shipping with it a documentation to reproduce the test locally.
Perhaps the PoB could be rewritten in a way to teach how to use the dictionary building API, thus addressing the other request on #4066.
I still got investigate TSAN, it may be the case that it is able to simply the PoB quite a bit.
I just noticed that the CI showed that a bot (MINGW64) failed to compile.
That is a combination I haven't tested (gcc@Windows), but I think probably the fix should be pretty simple (i.e. test if the compiler used is GCC and then define _GNU_SOURCE).
So not ready for landing yet, still needs some more work on it. :-)
Fixed in #4086