TBB memory bug
Hi parlay maintainers, there seems to be an evasive bug in the integration of parlay and TBB that took me a painful amount of time to catch. The following is a minimal reproducible example:
#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence
struct MstPrune {
vec<int> parent;
vec<vec<int>> mst;
void operator() (const int eventNregion) {
int neg1 = -1;
parent.assign(eventNregion, neg1);
for (int i = 0, iend = parent.size(); i < iend; ++i) parent[i] = i;
mst.resize(eventNregion);
for (int i = 0; i < eventNregion; ++i) mst[i].resize(0);
for (int i = 0; i < eventNregion; ++i) mst[parent[i]].emplace_back(i);
}
};
int main(int argc, char* argv[]) {
vec<int> sizes(argc);
for (int i = 0; i < argc; ++i) sizes[i] = std::atoi(argv[i]);
vec<MstPrune> mst(parlay::num_workers());
parlay::parallel_for(0, argc, [&](auto i)->void {
auto t = parlay::worker_id();
mst[t](sizes[i]);
});
int S = 0;
for (auto& m: mst) {
for (auto& x: m.mst) { for (auto& y: x) S += y; }
}
if (S != 77) std::cout << "---------- S = " << S << " ----------\n";
else std::cout << "S = " << S << ", ";
}
#undef vec
After compiling the code with the following command
g++ -std=c++20 -Ofast -DPARLAY_TBB -ltbb bug.cpp -o bug
and run
for i in {1..1000}; do ./bug 1 1 1 1 12 1 1 2 2 4 3; done
You should be able to see nondeterministic result varying around the true answer "77".
If we change vec<int> parent; to std::vector<int> parent;, everything works fine.
If we switch to parlay's default thread scheduler, i.e. g++ -std=c++20 -Ofast -pthread bug.cpp -o bug , everything also works fine.
I tested both the latest oneAPI/tbb and an older version. The problem remains the same.
When the code is compiled as a shared library and linked to runtimes, segmentation fault will pop.
Could you help check if the bug is reproducible and provide some assistance?
OK, I closed the issue earlier because the previous example did not illustrate my point. The memory error only occurs when the code is compiled as a shared library and linked to runtimes. As developers who extensively write and link C++ libraries to R and Python, we face an additional layer of complexity. I am now more certain that the problem lies with TBB. I have cleaned and reduced the example to the following:
- In the current directory, create
bug.cppwith the code below:
// We create a vector of `m` functors (`Struct`), where `m` represents the number of threads.
// Each functor contains a sequence of integers (`x`) and a sequence of sequences (`y`).
// When running in parallel using TBB, allocating memory for `y` can unexpectedly affect
// the memory allocation of `x`, which should not occur.
// Changing the sequence type from `parlay::sequence` to `std::vector` can resolve this issue.
// Alternatively, modifying `y` from a sequence of sequences to a simple sequence of integers
// also eliminates the error.
// The issue likely stems from compatibility problems between TBB and
// Parlay's memory allocator for small objects.
#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence
struct Struct {
vec<int> x;
vec<vec<int>> y; // Oddly, if y is vec<int> instead vec<vec<int>>, everything would work.
bool operator() ( const int n ) {
x.resize(n);
std::iota(x.begin(), x.end(), 0);
size_t addr = size_t(x.data()); // Record the mem address as an integer.
int xsize = x.size();
y.resize(n); // Just allocate y;
// If the allocation of `parent` is changed or its size is altered,
// print message.
if (size_t(x.data()) != addr or int(x.size()) != xsize) {
auto s = "Wrong! Why is `x` changed ?\n"
"The initial address 'x.data()' is " + std::to_string(addr) +
", but the current address is " + std::to_string(size_t(x.data())) +
".\nThe initial size is " + std::to_string(xsize) +
", but the current size is " + std::to_string(x.size()) +
".\nThe initial elements are [0, ..., " + std::to_string(xsize) +
"], but the current elements are [";
for (int i = 0, iend = x.size(); i < iend; ++i)
s += std::to_string(x[i]) + ", ";
s += "].\n\n\n";
std::cerr << s;
return false;
}
return true;
}
};
void bug() {
std::vector<int> ns {1, 1, 1, 1, 12, 1, 1, 2, 2, 4};
vec<Struct> mstPv(parlay::num_workers());
parlay::parallel_for(0, ns.size(), [&](auto i)->void {
auto t = parlay::worker_id();
mstPv[t](ns[i]);
});
}
#undef vec
- Create
main.cppwith the following code:
void bug(void);
int main() {
for (int i = 0; i < 10000; ++i) bug(); // Run the buggy code 10000 times since the error occurs at random.
}
- Run the following 3 sets of commands: use
parlay's native thread scheduler, useopenmp, and finally usetbb. Everything works fine except fortbb. It will print a large number of messages.
g++ -std=c++20 -shared -pthread -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug main.cpp -o main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main
g++ -std=c++20 -shared -DPARLAY_OPENMP -fopenmp -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug main.cpp -o main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main
g++ -std=c++20 -shared -Ofast -DPARLAY_TBB -ltbb -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug main.cpp -o main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main
I hope the authors can confirm and resolve this issue:) Our entire system is built on parlay and TBB. We prefer not to abandon TBB, as it serves as the backend for the C++ parallel STL, which includes parallelized algorithms not available in parlay
Thank you!