gecode
                                
                                 gecode copied to clipboard
                                
                                    gecode copied to clipboard
                            
                            
                            
                        Memory leak of solutions at Engine in a multi-threading context
Description
There is a memory leak at Engine in a multi-threading context when the following conditions are met:
- Thread number > 1
- CSP having more than one solution
- Only one solution is retrieved through next()
To Reproduce
To reproduce the issue, here is the following code (based on the SendMoreMoney example code). The code is modified in order to have more than one solution :
#include <gecode/int.hh>
#include <gecode/search.hh>
using namespace Gecode;
class SendMoreMoney : public Space {
protected:
  IntVarArray l;
public:
  SendMoreMoney(void) : l(*this, 8, 0, 9) {
    IntVar s(l[0]), e(l[1]), n(l[2]), d(l[3]),
           m(l[4]), o(l[5]), r(l[6]), y(l[7]);
    // no leading zeros
//    rel(*this, s, IRT_NQ, 0);
//    rel(*this, m, IRT_NQ, 0);
    // all letters distinct
//    distinct(*this, l);
    // linear equation
    IntArgs c(4+4+5); IntVarArgs x(4+4+5);
    c[0]=1000; c[1]=100; c[2]=10; c[3]=1;
    x[0]=s;    x[1]=e;   x[2]=n;  x[3]=d;
    c[4]=1000; c[5]=100; c[6]=10; c[7]=1;
    x[4]=m;    x[5]=o;   x[6]=r;  x[7]=e;
    c[8]=-10000; c[9]=-1000; c[10]=-100; c[11]=-10; c[12]=-1;
    x[8]=m;      x[9]=o;     x[10]=n;    x[11]=e;   x[12]=y;
    linear(*this, c, x, IRT_EQ, 0);
    // post branching
    branch(*this, l, INT_VAR_SIZE_MAX(), INT_VAL_MAX());
  }
  // search support
  SendMoreMoney(SendMoreMoney& s) : Space(s) {
    l.update(*this, s.l);
  }
  virtual Space* copy(void) {
    //std::cout << "Clone occurred" << std::endl;
    return new SendMoreMoney(*this);
  }
  // print solution
  void print(void) const {
    std::cout << l << std::endl;
  }
  ~SendMoreMoney() {
    //std::cout << "Destructor Ok" << std::endl;
  }
};
// main function
int main(int argc, char* argv[]) {
  Gecode::Search::Options opt;
  opt.threads = 8;
  opt.c_d     = 1;
      
  // create model and search engine
  SendMoreMoney* m = new SendMoreMoney;
  DFS<SendMoreMoney> e(m, opt);
  delete m;
  // search and print all solutions
  if (SendMoreMoney* s = e.next()) {
    s->print(); delete s;
  }
  return 0;
}
Compilation with Debug mode:
g++ -g -std=c++11 -o smm ./send-more-money.cpp -lgecodesearch -lgecodeint -lgecodekernel -lgecodesupport
Platform
- Ubuntu 22.04
- g++ 11.3.0
- Gecode 6.2.0 (issue also exists in the 6.3.0 branch)
Valgrind output
...
==4005193== 2,714,208 (350,816 direct, 2,363,392 indirect) bytes in 1,154 blocks are definitely lost in loss record 11 of 11
==4005193==    at 0x4E05045: malloc (vg_replace_malloc.c:381)
==4005193==    by 0x155216: alloc (allocator.hpp:80)
==4005193==    by 0x155216: ralloc (heap.hpp:358)
==4005193==    by 0x155216: operator new (heap.hpp:416)
==4005193==    by 0x155216: SendMoreMoney::copy() (send-more-money.cpp:37)
==4005193==    by 0x274063: Gecode::Space::_clone() (core.cpp:757)
==4005193==    by 0x15E83F: clone (core.hpp:3228)
==4005193==    by 0x15E83F: Gecode::Search::Par::DFS<Gecode::Search::NoTraceRecorder>::Worker::run() (dfs.hpp:249)
==4005193==    by 0x2792F4: Gecode::Support::Thread::Run::exec() (thread.cpp:60)
==4005193==    by 0x2793EE: Gecode::Support::bootstrap(void*) (pthreads.cpp:43)
==4005193==    by 0x52D4B42: start_thread (pthread_create.c:442)
==4005193==    by 0x5365BB3: clone (clone.S:100)
==4005193== 
==4005193== LEAK SUMMARY:
==4005193==    definitely lost: 350,816 bytes in 1,154 blocks
==4005193==    indirectly lost: 2,363,392 bytes in 1,154 blocks
==4005193==      possibly lost: 137,544 bytes in 11 blocks
==4005193==    still reachable: 1,256 bytes in 9 blocks
==4005193==         suppressed: 0 bytes in 0 blocks
==4005193== Reachable blocks (those to which a pointer was found) are not shown.
Analysis
After a short analysis, I found out that:
- In gecode/search/par/engine.hh, theSupport::DynamicQueue<Space*,Heap> solutionsattribute contains pointers to the solutions that are pushed by workers during the search phase.
- The solutionsis consumed through theEngine<Tracer>::next(void)method.
- If the solutions aren't consumed, the Space*objects remain in thesolutionsqueue. TheSpace*objects are generated bySpace* s = cur->clone();and pushed to thesolutionsqueue when a solution is found.
- At the end of processing, the destructor of the Support::DynamicQueue<Space*, Heap> solutionsobject frees its memory througha.free(q,limit)
- However, unconsumed Space*objects are definitely lost as their references are removed by the previous operation.
Draft solution
To explicitly consume remaining Space* through solution.pop() then delete those objects when the destructor of Engine<Tracer> is called.
Thanks for this great error report, reproduction, and analysis!
On a cursory look, I would agree with your analysis fully. Will check more fully later.