choco-solver icon indicating copy to clipboard operation
choco-solver copied to clipboard

[BUG] Missing solutions when using custom strategy with ParallelPortfolio and argparse4j flag?

Open Holt59 opened this issue 3 years ago • 12 comments

Describe the bug

I encounter a very strange bug when using ParallelPortfolio. I cannot post the code to reproduce it because the model is quite complex, but I'll try to explain as best as possible.

My code looks like the following:

// this is VERY relevant
final ArgumentParser parser = ArgumentParsers.newFor("My Solver").build().defaultHelp(true);
parser.addArgument("-s", "--silent").action(Arguments.storeTrue()).setDefault(false)

final Namespace res = parser.parseArgs(args);

// create the portfolio
final boolean useCustomStrategy = ...; // this can be set from command line, see below
final int nThreads = ...; // this can be set from command line, see below

final ParallelPortfolio portfolio = new ParallelPortfolio();
for (int i = 0; i < nThreads; ++i) {

    final MyModel model = new MyModel(...);
   
    // see below
    model.getSolver().setEarch(useCustomStrategy ? model.makeCustomStrategy() : model.makeDefaultStrategy());  

    portfolio.addModel(model);
}
portfolio.stealNogoodsOnRestarts();

final List<MySolution> solutions = new ArrayList<>();
while (portfolio.solve()) {
    final MySolution solution = new MySolution((MyModel) portfolio.getBestModel());
    solutions.add(solution);
}

if (!res.getBoolean("silent")) {
    // display solutions
}

The two search strategy are similar:

  • makeDefaultStrategy returns Search.intVarSearch(vars), where vars is a subset of the variables of the models,
  • makeCustomStrategy returns Search.intVarSearch(varSelector, new IntDomainMin(), vars);, where vars is the same subset as the default strategy, and varSelector is a custom variable selector. The model is a satisfaction model, so the only difference with the "default" strategy is the variable selector.

The code can be run from the command line, and the following parameters can be set:

  • the number of threads (my actual code contain stuff specific to single-thread that do not rely on a portfolio),
  • the selection of the search strategy (default vs. custom), with --use-custom-strategy,
  • the "silent" (--silent) flag that displays solutions or not at the end of the program (outside of the .solve() loop).

Here is the very strange behavior I observe:

  • with a single thread, one solution is found, regardless of the other parameters,
  • with up to 4 threads, the same solution is found (sometimes multiple times, but I guess that's expected?), regardless of the other parameters,
  • with 5 or more threads, the problem occurs
    • without any extra flags, the solution is found,
    • with --silent OR --use-custom-strategy, the solution is NOT found (0 solutions are found),
    • with BOTH --silent AND --use-custom-strategy, the solution is found.

Note that --silent has strictly no impact on the model in my code, it's purely a display flag.

This is the behavior with Choco 4.10.7, with 4.10.6 the solution is not found (with 5 or more threads) if there are no flags, but is found with any combination of flags (silent and/or custom strategy).

Environment:

  • Choco-solver version: e.g. 4.10.7
  • JRE : 8

Holt59 avatar Dec 23 '21 08:12 Holt59

Can you try with the master branch? Can you tell what is your hardware configuration (number of CPUs mainly)? That may be related to the default search strategy, but without executable code, it is quiet hard to assess that.

cprudhom avatar Jan 03 '22 11:01 cprudhom

Actually, the issue seems unrelated to the strategy or argparse, it simply occurs from time to time. I tried to debug it, and for what I understand, there is a big issue in ParallelPortfolio for satisfaction problem as it can discard solutions if the solution are found "at the same time".

I basically have the following loop:

while (portfolio.solve()) {
    final Model bestModel = portfolio.getBestModel();
    // ...
}

And if I log from updateFromSolution and getBestModel, here is what I get (I log the index of the given model in updateFromSolution and the index of finder in getBestModel):

updateFromSolution(1)
updateFromSolution(0)
getBestModel():  0
updateFromSolution(1)
getBestModel():  1
updateFromSolution(1)
getBestModel():  1
updateFromSolution(1)
updateFromSolution(0)
getBestModel():  0

Which means that solve() returned only once while updateFromSolution was called twice, and this happened two times, so here I missed two solutions.

I don't really know how to do that but solve() should check models for solution (except finder, and update it) before restarting the pool.

Holt59 avatar Jan 28 '22 13:01 Holt59

I also notice from the comment on finder ("point to (one of) the solver(s) which found a solution") that this is likely expected, but I don't really understand how... I can understand the idea for optimization problem, but for satisfaction problem, this cause some troubles.

Holt59 avatar Jan 28 '22 14:01 Holt59

Hi @Holt59

I was able to reproduce the issue and I think I found a patch. I replaced (l.140):

private Model finder;

by

private final AtomicReference<Model> finder = new AtomicReference<>();

and adapt the code in consequence.

Can you apply the patch and tell if that fixes the issue?

cprudhom avatar Jan 31 '22 12:01 cprudhom

That is not enough.

    @Test(groups = "10s", timeOut = 60000)
      public void testP111() {
          ParallelPortfolio pares = new ParallelPortfolio();
          int n = 2; // number of solvers to use
          for (int i = 0; i < n; i++) {
              pares.addModel(ProblemMaker.makeNQueenWithBinaryConstraints(12));
          }
          pares.stealNogoodsOnRestarts();
          int nbSols = 0;
          while (pares.solve() && nbSols < 20) {
              final Model bestModel = pares.getBestModel();
              nbSols++;
          }
          Model finder = pares.getBestModel();
          Assert.assertTrue(nbSols == 20);
          Assert.assertEquals(pares.getModels().stream().mapToLong(m -> m.getSolver().getSolutionCount()).sum(), 20);
          Assert.assertNotNull(finder);
      }

cprudhom avatar Jan 31 '22 12:01 cprudhom

I can't see any patch since, indeed, two models can find a solution at the very same time and thus, one is declared as "best" and the other one is discard. That doesn't seem to be an issue for me, since a solution is returned.

cprudhom avatar Jan 31 '22 12:01 cprudhom

That's an issue when looking for all solutions to a satisfaction problem. My current workaround is to add a IMonitorSolution to all the solvers and use it to fill the list of possible solutions.

If it's by design, it should simply be mentioned in the documentation I think.

Holt59 avatar Jan 31 '22 12:01 Holt59

I don't think there is a way to certify that each solution appears once when looking for all of them in a portfolio. Your solution seems the more straightforward, you can eventually add nogoods from solution also.

cprudhom avatar Jan 31 '22 13:01 cprudhom

I don't think there is a way to certify that each solution appears once when looking for all of them in a portfolio.

I can totally understand a solution being found multiple times, that's easy to filter in post-process, but solutions being missed cannot be fixed afterwards, hence the issue.

Your solution seems the more straightforward, you can eventually add nogoods from solution also.

Is there a way to add nogoods to all models during the solve() process without breaking back-propgation and/or clause learning?

Holt59 avatar Jan 31 '22 14:01 Holt59

I agree, and I cannot understand how this happens:

with --silent OR --use-custom-strategy, the solution is NOT found (0 solutions are found),

Does that mean that one thread (th1) ends without any solution and stop the other one (th2) ? If true, how come th1 was not able to find a solution if there is one?

cprudhom avatar Jan 31 '22 15:01 cprudhom

I agree, and I cannot understand how this happens:

with --silent OR --use-custom-strategy, the solution is NOT found (0 solutions are found),

Does that mean that one thread (th1) ends without any solution and stop the other one (th2) ? If true, how come th1 was not able to find a solution if there is one?

I'll try to reproduce and dig a bit more - I made a lot of changes to the code since this, so I'm not sure I'd be able to reproduce, and it's possible that there were issues in my model when I did notice the "bug" so I think it's better to discard the original issue at this point (the "not finding all solutions" problem is still open though).

Holt59 avatar Jan 31 '22 17:01 Holt59

@Holt59 I think this issue is not relevant anymore, do you agree?

cprudhom avatar Jun 14 '22 15:06 cprudhom