gecode icon indicating copy to clipboard operation
gecode copied to clipboard

Memory leak of solutions at Engine in a multi-threading context

Open alex-87 opened this issue 2 years ago • 1 comments

Description

There is a memory leak at Engine in a multi-threading context when the following conditions are met:

  • Thread number > 1
  • CSP having more than one solution
  • Only one solution is retrieved through next()

To Reproduce

To reproduce the issue, here is the following code (based on the SendMoreMoney example code). The code is modified in order to have more than one solution :

#include <gecode/int.hh>
#include <gecode/search.hh>

using namespace Gecode;

class SendMoreMoney : public Space {
protected:
  IntVarArray l;
public:
  SendMoreMoney(void) : l(*this, 8, 0, 9) {
    IntVar s(l[0]), e(l[1]), n(l[2]), d(l[3]),
           m(l[4]), o(l[5]), r(l[6]), y(l[7]);
    // no leading zeros
//    rel(*this, s, IRT_NQ, 0);
//    rel(*this, m, IRT_NQ, 0);
    // all letters distinct
//    distinct(*this, l);
    // linear equation
    IntArgs c(4+4+5); IntVarArgs x(4+4+5);
    c[0]=1000; c[1]=100; c[2]=10; c[3]=1;
    x[0]=s;    x[1]=e;   x[2]=n;  x[3]=d;
    c[4]=1000; c[5]=100; c[6]=10; c[7]=1;
    x[4]=m;    x[5]=o;   x[6]=r;  x[7]=e;
    c[8]=-10000; c[9]=-1000; c[10]=-100; c[11]=-10; c[12]=-1;
    x[8]=m;      x[9]=o;     x[10]=n;    x[11]=e;   x[12]=y;
    linear(*this, c, x, IRT_EQ, 0);
    // post branching
    branch(*this, l, INT_VAR_SIZE_MAX(), INT_VAL_MAX());
  }
  // search support
  SendMoreMoney(SendMoreMoney& s) : Space(s) {
    l.update(*this, s.l);
  }
  virtual Space* copy(void) {
    //std::cout << "Clone occurred" << std::endl;
    return new SendMoreMoney(*this);
  }
  // print solution
  void print(void) const {
    std::cout << l << std::endl;
  }

  ~SendMoreMoney() {
    //std::cout << "Destructor Ok" << std::endl;
  }
};

// main function
int main(int argc, char* argv[]) {

  Gecode::Search::Options opt;
  opt.threads = 8;
  opt.c_d     = 1;
      
  // create model and search engine
  SendMoreMoney* m = new SendMoreMoney;
  DFS<SendMoreMoney> e(m, opt);
  delete m;
  // search and print all solutions
  if (SendMoreMoney* s = e.next()) {
    s->print(); delete s;
  }
  return 0;
}

Compilation with Debug mode:

g++ -g -std=c++11 -o smm ./send-more-money.cpp -lgecodesearch -lgecodeint -lgecodekernel -lgecodesupport

Platform

  • Ubuntu 22.04
  • g++ 11.3.0
  • Gecode 6.2.0 (issue also exists in the 6.3.0 branch)

Valgrind output

...
==4005193== 2,714,208 (350,816 direct, 2,363,392 indirect) bytes in 1,154 blocks are definitely lost in loss record 11 of 11
==4005193==    at 0x4E05045: malloc (vg_replace_malloc.c:381)
==4005193==    by 0x155216: alloc (allocator.hpp:80)
==4005193==    by 0x155216: ralloc (heap.hpp:358)
==4005193==    by 0x155216: operator new (heap.hpp:416)
==4005193==    by 0x155216: SendMoreMoney::copy() (send-more-money.cpp:37)
==4005193==    by 0x274063: Gecode::Space::_clone() (core.cpp:757)
==4005193==    by 0x15E83F: clone (core.hpp:3228)
==4005193==    by 0x15E83F: Gecode::Search::Par::DFS<Gecode::Search::NoTraceRecorder>::Worker::run() (dfs.hpp:249)
==4005193==    by 0x2792F4: Gecode::Support::Thread::Run::exec() (thread.cpp:60)
==4005193==    by 0x2793EE: Gecode::Support::bootstrap(void*) (pthreads.cpp:43)
==4005193==    by 0x52D4B42: start_thread (pthread_create.c:442)
==4005193==    by 0x5365BB3: clone (clone.S:100)
==4005193== 
==4005193== LEAK SUMMARY:
==4005193==    definitely lost: 350,816 bytes in 1,154 blocks
==4005193==    indirectly lost: 2,363,392 bytes in 1,154 blocks
==4005193==      possibly lost: 137,544 bytes in 11 blocks
==4005193==    still reachable: 1,256 bytes in 9 blocks
==4005193==         suppressed: 0 bytes in 0 blocks
==4005193== Reachable blocks (those to which a pointer was found) are not shown.

Analysis

After a short analysis, I found out that:

  1. In gecode/search/par/engine.hh, the Support::DynamicQueue<Space*,Heap> solutions attribute contains pointers to the solutions that are pushed by workers during the search phase.
  2. The solutions is consumed through the Engine<Tracer>::next(void) method.
  3. If the solutions aren't consumed, the Space* objects remain in the solutions queue. The Space* objects are generated by Space* s = cur->clone(); and pushed to the solutions queue when a solution is found.
  4. At the end of processing, the destructor of the Support::DynamicQueue<Space*, Heap> solutions object frees its memory through a.free(q,limit)
  5. However, unconsumed Space* objects are definitely lost as their references are removed by the previous operation.

Draft solution To explicitly consume remaining Space* through solution.pop() then delete those objects when the destructor of Engine<Tracer> is called.

alex-87 avatar Jan 05 '23 19:01 alex-87

Thanks for this great error report, reproduction, and analysis!

On a cursory look, I would agree with your analysis fully. Will check more fully later.

zayenz avatar Jan 05 '23 20:01 zayenz