parlaylib icon indicating copy to clipboard operation
parlaylib copied to clipboard

TBB memory bug

Open WhateverLiu opened this issue 1 year ago • 1 comments

Hi parlay maintainers, there seems to be an evasive bug in the integration of parlay and TBB that took me a painful amount of time to catch. The following is a minimal reproducible example:

#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence

struct MstPrune {
  vec<int> parent;
  vec<vec<int>> mst;
  void operator() (const int eventNregion) {
    int neg1 = -1;
    parent.assign(eventNregion, neg1);
    for (int i = 0, iend = parent.size(); i < iend; ++i) parent[i] = i;
    mst.resize(eventNregion);
    for (int i = 0; i < eventNregion; ++i) mst[i].resize(0);
    for (int i = 0; i < eventNregion; ++i) mst[parent[i]].emplace_back(i);
  } 
};  

int main(int argc, char* argv[]) {
  vec<int> sizes(argc);
  for (int i = 0; i < argc; ++i) sizes[i] = std::atoi(argv[i]);
  vec<MstPrune> mst(parlay::num_workers());
  parlay::parallel_for(0, argc, [&](auto i)->void {
    auto t = parlay::worker_id();
    mst[t](sizes[i]);
  });
  int S = 0;
  for (auto& m: mst) {
    for (auto& x: m.mst) { for (auto& y: x) S += y; }
  }
  if (S != 77) std::cout << "---------- S = " << S << " ----------\n";
  else std::cout << "S = " << S << ", ";
}

#undef vec

After compiling the code with the following command g++ -std=c++20 -Ofast -DPARLAY_TBB -ltbb bug.cpp -o bug and run for i in {1..1000}; do ./bug 1 1 1 1 12 1 1 2 2 4 3; done You should be able to see nondeterministic result varying around the true answer "77".

If we change vec<int> parent; to std::vector<int> parent;, everything works fine.

If we switch to parlay's default thread scheduler, i.e. g++ -std=c++20 -Ofast -pthread bug.cpp -o bug , everything also works fine.

I tested both the latest oneAPI/tbb and an older version. The problem remains the same.

When the code is compiled as a shared library and linked to runtimes, segmentation fault will pop.

Could you help check if the bug is reproducible and provide some assistance?

WhateverLiu avatar Aug 05 '24 02:08 WhateverLiu

OK, I closed the issue earlier because the previous example did not illustrate my point. The memory error only occurs when the code is compiled as a shared library and linked to runtimes. As developers who extensively write and link C++ libraries to R and Python, we face an additional layer of complexity. I am now more certain that the problem lies with TBB. I have cleaned and reduced the example to the following:

  1. In the current directory, create bug.cpp with the code below:
// We create a vector of `m` functors (`Struct`), where `m` represents the number of threads.
// Each functor contains a sequence of integers (`x`) and a sequence of sequences (`y`).
// When running in parallel using TBB, allocating memory for `y` can unexpectedly affect 
// the memory allocation of `x`, which should not occur.
// Changing the sequence type from `parlay::sequence` to `std::vector` can resolve this issue.
// Alternatively, modifying `y` from a sequence of sequences to a simple sequence of integers 
// also eliminates the error.
// The issue likely stems from compatibility problems between TBB and 
// Parlay's memory allocator for small objects.

#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence

struct Struct {
  vec<int> x;
  vec<vec<int>> y; // Oddly, if y is vec<int> instead vec<vec<int>>, everything would work.
  bool operator() ( const int n ) { 
    x.resize(n);
    std::iota(x.begin(), x.end(), 0);
    size_t addr = size_t(x.data()); // Record the mem address as an integer.
    int xsize = x.size();
    y.resize(n); // Just allocate y;
    
    // If the allocation of `parent` is changed or its size is altered, 
    //   print message.
    if (size_t(x.data()) != addr or int(x.size()) != xsize) {
      auto s = "Wrong! Why is `x` changed ?\n"
      "The initial address 'x.data()' is " + std::to_string(addr) + 
        ", but the current address is " + std::to_string(size_t(x.data())) +
        ".\nThe initial size is " + std::to_string(xsize) + 
        ", but the current size is " + std::to_string(x.size()) +
        ".\nThe initial elements are [0, ..., " + std::to_string(xsize) +
        "], but the current elements are [";
      for (int i = 0, iend = x.size(); i < iend; ++i) 
        s += std::to_string(x[i]) + ", ";
      s += "].\n\n\n";
      std::cerr << s;
      return false;
    }
    return true;
  }  
}; 

void bug() {
  std::vector<int> ns {1, 1, 1, 1, 12, 1, 1, 2, 2, 4};
  vec<Struct> mstPv(parlay::num_workers());
  parlay::parallel_for(0, ns.size(), [&](auto i)->void {
    auto t = parlay::worker_id();
    mstPv[t](ns[i]);  
  }); 
}
#undef vec
  1. Create main.cpp with the following code:
void bug(void);
int main() {
  for (int i = 0; i < 10000; ++i) bug(); // Run the buggy code 10000 times since the error occurs at random. 
}
  1. Run the following 3 sets of commands: use parlay's native thread scheduler, use openmp, and finally use tbb. Everything works fine except for tbb. It will print a large number of messages.
g++ -std=c++20 -shared -pthread -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

g++ -std=c++20 -shared -DPARLAY_OPENMP -fopenmp -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

g++ -std=c++20 -shared -Ofast -DPARLAY_TBB -ltbb -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

I hope the authors can confirm and resolve this issue:) Our entire system is built on parlay and TBB. We prefer not to abandon TBB, as it serves as the backend for the C++ parallel STL, which includes parallelized algorithms not available in parlay

Thank you!

WhateverLiu avatar Aug 05 '24 06:08 WhateverLiu