timefold-solver icon indicating copy to clipboard operation
timefold-solver copied to clipboard

ConstraintCollectors.toConsecutiveSequences not updated after shadow variable update

Open keatingn opened this issue 1 year ago • 5 comments

Describe the bug We have a Stock class which tracks the stock level of a Product per Period. The quantity of the Stock is a shadow variable of the quantity of a Job. The Job.quantity is the only genuine planning variable. We are seeing score corruption with the following constraint which should penalise if a product is out of stock for more that one period:

    protected Constraint outOfStock(ConstraintFactory constraintFactory) {
        return constraintFactory
                .forEach(Stock.class)
                .filter(stock -> stock.getQuantity() < 0)
                .groupBy(Stock::getProduct,
                        ConstraintCollectors.toConsecutiveSequences(stock -> stock.getPeriod().getIndex()))
                .flattenLast(SequenceChain::getConsecutiveSequences)
                .filter((product, sequence) -> sequence.getCount() > 1)
                .penalize(HardSoftLongScore.ONE_HARD, (product, sequence) -> sequence.getCount() - 1)
                .asConstraint("outOfStock");
    }

Expected behavior If a Product is out of stock for say 10 periods with a penalty of -9hard/0soft, and a Job is assigned quantity in period 5 which brings the Stock above 0 for Period 5, I would expect the sequence to split into [1,2,3,4] and [6,7,8,9,10] resulting in a penalty of -7hard/0soft

Actual behavior With FULL_ASSERT we are seeing score corruption:

java.lang.IllegalStateException: Score corruption (1hard): the expectedWorkingScore (-8hard/-9soft) is not the workingScore (-9hard/-9soft) after completedAction (Job(product=Product(name=product-1), period=Period(index=1, name=p1), jobId=p1-product-1, quantity=10) {10 -> 10}).
	at ai.timefold.solver.core.impl.score.director.AbstractScoreDirector.assertExpectedWorkingScore(AbstractScoreDirector.java:556) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.phase.scope.AbstractPhaseScope.assertExpectedWorkingScore(AbstractPhaseScope.java:168) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.phase.AbstractPhase.predictWorkingStepScore(AbstractPhase.java:145) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.localsearch.DefaultLocalSearchPhase.doStep(DefaultLocalSearchPhase.java:105) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:94) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:82) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:200) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.DefaultSolverJob.call(DefaultSolverJob.java:112) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:317) ~[na:na]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

The completedAction is a bit strange in the above exception as there is no {10 -> 10} move. There is a {0 -> 10} and its undo move {10 -> 0} Also, there are no custom move.

When the AbstractScoreDirector.assertExpectedWorkingScore calls calculateScore() on line 553 I can see the sequence includes the stock from period 1 with a quantity of 0, which should be filtered out.

Running without FULL_ASSERT the local search takes 188 steps and result in the following exception:

2024-07-03T11:12:45.854+01:00 DEBUG 92814 --- [sequence-issue] [pool-1-thread-1] a.t.s.c.i.l.DefaultLocalSearchPhase      :     LS step (188), time spent (104), score (-7hard/-10soft),     best score (-7hard/-6soft), accepted/selected move count (1/1), picked move (Job(product=Product(name=product-1), period=Period(index=5, name=p5), jobId=p5-product-1, quantity=3) {3 -> 10}).
2024-07-03T11:12:45.856+01:00 ERROR 92814 --- [sequence-issue] [pool-1-thread-1] e.s.issue.bootstrap.SampleGenerator      : Failed solving plan (-9hard/-10soft).

java.lang.NullPointerException: Cannot invoke "java.util.Map$Entry.getKey()" because "firstBeforeItemEntry" is null
	at ai.timefold.solver.core.impl.score.stream.collector.consecutive.ConsecutiveSetTree.remove(ConsecutiveSetTree.java:249) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.collector.SequenceCalculator.retract(SequenceCalculator.java:30) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.collector.uni.ObjectCalculatorUniCollector.lambda$accumulator$0(ObjectCalculatorUniCollector.java:26) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.AbstractGroupNode.update(AbstractGroupNode.java:184) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.tuple.AbstractConditionalTupleLifecycle.update(AbstractConditionalTupleLifecycle.java:24) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.tuple.AggregatedTupleLifecycle.update(AggregatedTupleLifecycle.java:20) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.StaticPropagationQueue.propagate(StaticPropagationQueue.java:93) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.StaticPropagationQueue.processAndClear(StaticPropagationQueue.java:116) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.StaticPropagationQueue.propagateUpdates(StaticPropagationQueue.java:98) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.common.Propagator.propagateEverything(Propagator.java:65) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.BavetConstraintSession.calculateScoreInLayer(BavetConstraintSession.java:92) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.stream.bavet.BavetConstraintSession.calculateScore(BavetConstraintSession.java:83) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.director.stream.BavetConstraintStreamScoreDirector.calculateScore(BavetConstraintStreamScoreDirector.java:49) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.score.director.AbstractScoreDirector.doAndProcessMove(AbstractScoreDirector.java:253) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.localsearch.decider.LocalSearchDecider.doMove(LocalSearchDecider.java:117) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.localsearch.decider.LocalSearchDecider.decideNextStep(LocalSearchDecider.java:98) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:72) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:82) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:200) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at ai.timefold.solver.core.impl.solver.DefaultSolverJob.call(DefaultSolverJob.java:112) ~[timefold-solver-core-1.11.0.jar:1.11.0]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

To Reproduce Please find attached the project used to produced the above exceptions

Environment

Timefold Solver Version or Git ref: 1.11.0

Output of java -version: openjdk version "21.0.2" 2024-01-16 OpenJDK Runtime Environment (build 21.0.2+13-58) OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)

Output of uname -a or ver: Darwin Noels-Work-MacBook-Pro.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64

Additional information

sequence-issue.zip

Provide any and all other information which might be relevant to the issue.

keatingn avatar Jul 03 '24 10:07 keatingn

Hello @keatingn! This is most likely not a bug, rather a misunderstanding of how the Constraint Streams API works.

You say Job.quantity is the only genuine variable. Yet your stream doesn't do forEach(Job.class), nor join or ifExists on the same. Therefore, the stream will not be updated when the quantity is updated.

Once you make the constraint dependent on Job in any way, I expect the problem to go away. Please confirm. If the issue persists, it means there's probably a bug in the collector, as you suggest.

triceo avatar Jul 03 '24 10:07 triceo

Another option: the variable listener could be wrong, not updating the quantities correctly in some corner case.

The way to prove that the issue lies in the collector would be to work with the collector directly, replicate a particular series of changes as direct calls to the collector's API, and showing an incorrect result.

triceo avatar Jul 03 '24 10:07 triceo

Hello @triceo, thank you for such a quick reply.

The Stock.quantity is a shadow variable of the Job.quantity.

public class Stock {
    private Period period;
    private Product product;
    @ShadowVariable(
            sourceEntityClass = Job.class,
            sourceVariableName = "quantity",
            variableListenerClass = StockQuantityVariableListener.class)
    private Long quantity;
}

I was under the impression this would be enough to allow constraints using just .forEach(Stock.class) and there is another constraint that doesn't use the collector which seems work work as expected.

    protected Constraint service(ConstraintFactory constraintFactory) {
        return constraintFactory
                .forEach(Stock.class)
                .filter(stock -> stock.getQuantity() < 0)
                .penalize(HardSoftLongScore.ONE_SOFT)
                .asConstraint("service");
    }

I don't think the variable listener is the issue either as the above constraint works with FULL_ASSERT when the 'outOfStock' (which uses the ConsecutiveSequences collector) is disabled.

I also don't think it's a corner case, as it fails on the first step once the outOfStock constraint is enabled. I'll attached the code for the variable listener below so you don't have to deal with the attached zip file.

Hopefully, you'll be able to see something I can't, either way, thank you for your time.

public class StockQuantityVariableListener implements VariableListener<Plan, Job> {

    @Override
    public boolean requiresUniqueEntityEvents() {
        return true;
    }

    @Override
    public void beforeVariableChanged(ScoreDirector<Plan> scoreDirector, Job job) {
        if (job.getQuantity() != null) {
            adjust(scoreDirector, job, -job.getQuantity());
        }
    }

    @Override
    public void afterVariableChanged(ScoreDirector<Plan> scoreDirector, Job job) {
        if (job.getQuantity() != null) {
            adjust(scoreDirector, job, job.getQuantity());
        }
    }

    private void adjust(ScoreDirector<Plan> scoreDirector, Job job, long adjustment) {
        if (adjustment == 0) {
            return;
        }
        List<Stock> futureStocks = getFutureStocks(scoreDirector, job.getProduct(), job.getPeriod());


        for (Stock stock : futureStocks) {
            scoreDirector.beforeVariableChanged(stock, "quantity");
            stock.setQuantity(stock.getQuantity() + adjustment);
            scoreDirector.afterVariableChanged(stock, "quantity");
        }
    }

    private List<Stock> getFutureStocks(
            ScoreDirector<Plan> scoreDirector, Product product, Period period) {
        return scoreDirector.getWorkingSolution().getStocks().stream()
                .filter(stock -> stock.getProduct().equals(product))
                .filter(stock -> stock.getPeriod().getIndex() >= period.getIndex())
                .sorted(Comparator.comparing(stock -> stock.getPeriod().getIndex()))
                .toList();
    }

    @Override
    public void beforeEntityAdded(ScoreDirector<Plan> scoreDirector, Job job) {

    }

    @Override
    public void afterEntityAdded(ScoreDirector<Plan> scoreDirector, Job job) {

    }

    @Override
    public void beforeEntityRemoved(ScoreDirector<Plan> scoreDirector, Job job) {

    }

    @Override
    public void afterEntityRemoved(ScoreDirector<Plan> scoreDirector, Job job) {

    }
}

keatingn avatar Jul 03 '24 11:07 keatingn

Wrt. Stock.quantity - you're correct. I was incorrectly assuming an extra step in the process which would have caused the problem I was describing. Nevermind.

I appreciate you provided the reproducer; we will investigate when time permits.

triceo avatar Jul 03 '24 11:07 triceo

@Christopher-Chianelli fyi, I have changed the constraint to check for consecutive jobs with zero quantity to rule out issues with the shadow variable and/or shadow variable listener.

protected Constraint emptyJob(ConstraintFactory constraintFactory) {
        return constraintFactory
                .forEach(Job.class)
                .filter(job -> job.getQuantity() <= 0)
                .groupBy(Job::getProduct,
                        ConstraintCollectors.toConsecutiveSequences(job -> job.getPeriod().getIndex()))
                .flattenLast(SequenceChain::getConsecutiveSequences)
//                .filter((product, sequence) -> sequence.getCount() > 1)
                .penalize(HardSoftLongScore.ONE_HARD, (product, sequence) -> sequence.getCount() -1)
                .asConstraint("emptyJob");
    }

I'm still seeing the same issue where a member of the sequence is updated in a way that would exclude it from the sequence, but the constraint is not reevaluated.

Hope that makes sense.

~~If I get time this evening, I'll get the test project into~~ GitHub (https://github.com/keatingn/sequence-issue/)

keatingn avatar Jul 12 '24 10:07 keatingn

@keatingn I investigated the issue. The root cause is the generated equals/hashCode were using variables, which corrupts the internal indexes Timefold uses during score calculation. This violates the general contract for hashCode (as noted in the docs):

""" Objects used as group key must obey the general contract of hashCode. Most importantly, "whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer."

For this reason, it is not recommended to use mutable objects (especially mutable collections) as group keys. If planning entities are used as group keys, their hashCode must not be computed off of planning variables. Failure to follow this recommendation may result in runtime exceptions being thrown. """

In this case, a HashMap used by ConstraintCollectors.toConsecutiveSequences got corrupted. The fix is to exclude variables from equals/hashCode, see https://github.com/keatingn/sequence-issue/pull/2. Confusingly, lombok does not automatically exclude all non-annotated fields from equals/hashEquals if @EqualsAndHashCode.Include is used. If you only want explictly included fields to be used, put the @EqualsAndHashCode(onlyExplicitlyIncluded = true) at the top of the class (see https://www.baeldung.com/java-lombok-equalsandhashcode#1-include-at-field-level).

Christopher-Chianelli avatar Aug 27 '24 19:08 Christopher-Chianelli