timefold-solver icon indicating copy to clipboard operation
timefold-solver copied to clipboard

NPE when using entitySelector with sorting and STEP cache type

Open aioobe opened this issue 2 years ago • 9 comments

Describe the bug Using a sorted entitySelector with cacheType STEP causes a null pointer exception.

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.util.List.iterator()" because "this.cachedEntityList" is null
        at ai.timefold.solver.core.impl.heuristic.selector.entity.decorator.SortingEntitySelector.iterator(SortingEntitySelector.java:41)
        at ai.timefold.solver.core.impl.heuristic.selector.common.iterator.AbstractOriginalChangeIterator.<init>(AbstractOriginalChangeIterator.java:23)
        at ai.timefold.solver.core.impl.heuristic.selector.move.generic.ChangeMoveSelector$2.<init>(ChangeMoveSelector.java:101)
        at ai.timefold.solver.core.impl.heuristic.selector.move.generic.ChangeMoveSelector.iterator(ChangeMoveSelector.java:101)
        at ai.timefold.solver.core.impl.constructionheuristic.placer.PooledEntityPlacer$PooledEntityPlacingIterator.createUpcomingSelection(PooledEntityPlacer.java:30)
        at ai.timefold.solver.core.impl.constructionheuristic.placer.PooledEntityPlacer$PooledEntityPlacingIterator.createUpcomingSelection(PooledEntityPlacer.java:23)
        at ai.timefold.solver.core.impl.heuristic.selector.common.iterator.UpcomingSelectionIterator.hasNext(UpcomingSelectionIterator.java:27)
        at ai.timefold.solver.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase.solve(DefaultConstructionHeuristicPhase.java:45)
        at ai.timefold.solver.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:83)
        at ai.timefold.solver.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:193)
        at testschedules.Main.main(Main.java:15)

To Reproduce

  • Clone https://github.com/aioobe/timefold-repro-step-sorted-entities

  • Run with

    ./gradlew -Dorg.gradle.java.home=/path/to/java17 run
    

Environment

Timefold Solver Version or Git ref: 1.0.0

Output of java -version:

java version "17.0.7" 2023-04-18 LTS
Java(TM) SE Runtime Environment (build 17.0.7+8-LTS-224)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.7+8-LTS-224, mixed mode, sharing)

Output of uname -a or ver:

Linux xyz 5.15.0-76-generic #83~20.04.1-Ubuntu SMP Wed Jun 21 20:23:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Additional information

I think the issue is the following:

  1. DefaultConstructionHeuristicPhase.java:45 tries to start iterating over the entities
  2. Since cacheType STEP is used, the cachedEntityList in SortingEntitySelector has not been initialized.
  3. The cachedEntityList would have been initialized in DefaultConstructionHeuristicPhase.java:47, but that has obviously not been executed yet.

So there's a little chicken and egg situation.

aioobe avatar Jul 13 '23 09:07 aioobe

Thanks for reporting!

There are various combinations of cacheType and selector configurations, some of which we know do not work. If we know they do not work, we typically add a fail-fast describing that combination as not possible.

Obviously there are some combinations that do not work, of which we do not yet know. :-) I'll have to look into this and figure out if this is a combination that should work but doesn't, or if this is a combination that will just be "fixed" by adding another fail-fast.

I'm adding this on my backlog.

triceo avatar Jul 13 '23 11:07 triceo

Gotcha. If it matters, my use case is that I would very much like to be able to reevaluate how difficult an entity is to place "continuously" (in between steps). It can be expensive, but hopefully not something that's contradictory or unsound.

aioobe avatar Jul 13 '23 12:07 aioobe

I think injecting yourself at that level exposes you to APIs which are not stable. But even if that's OK for you, it certainly puts you at the mercy of behaviors which are undocumented and can change without warning.

In the process of working on https://github.com/TimefoldAI/timefold-solver/issues/123, which may already solve your use case in an "official" way, I have recently brought a significant improvement to explanation performance (https://github.com/TimefoldAI/timefold-solver/pull/171).

Another pair of eyes never hurts, so if you want to pull that and tell me what kind of results you're getting with that, I'd appreciate that. Data from my experiments show that this is a significant improvement, but nothing beats real-world data.

triceo avatar Jul 13 '23 12:07 triceo

Wow, those numbers look amazing. Really appreciate the work you and Geoffrey are doing in this area. If I have the time the coming days, I'll pull it and take a look. Otherwise I will most likely be the first person to use it when it's out!

I think injecting yourself at that level exposes you to APIs which are not stable.

I'm a little bit surprised that you say so since the docs says "Almost every Selector supports setting a cacheType", and isn't STEP intended precisely for cases when we want to reevaluate before each step? (Noting that it says "almost", and that it may very well be the case that STEP ends up not being supported here.)

aioobe avatar Jul 13 '23 13:07 aioobe

I looked briefly at the lines involving the stack trace... I jotted down some lines that may or may not be on the right track.

So DefaultConstructionHeuristicPhase.java:45 which looks as follows

for (Placement<Solution_> placement : entityPlacer) {  // <--- placement fetched here (which is not safe if using STEP)...
    ConstructionHeuristicStepScope<Solution_> stepScope = new ConstructionHeuristicStepScope<>(phaseScope);
    stepStarted(stepScope);
    decider.decideNextStep(stepScope, placement);      // <----- ...although not used until here

    ...

}

should perhaps be expressed something like this...

Iterator iter = null;

while (true) {
    ConstructionHeuristicStepScope<Solution_> stepScope = new ConstructionHeuristicStepScope<>(phaseScope);
    stepStarted(stepScope);

    // Deferred initialization of iterator
    if (iter == null || usingStepType) {
        iter = entityPlacer.iterator();  // <--- Safe to initialize iterator
    }

    /////////// Correspond to to the original for-each loop //////////
    if (!iter.hasNext()) {
        break;
    }
    Placement<Solution_> placement = iter.next();
    //////////////////////////////////////////////////////////////////

    decider.decideNextStep(stepScope, placement);

    ...
}

Somewhat messy. Could probably be cleaned up.

aioobe avatar Jul 13 '23 14:07 aioobe

I'm a little bit surprised that you say so since the docs says "Almost every Selector supports setting a cacheType", and isn't STEP intended precisely for cases when we want to reevaluate before each step? (Noting that it says "almost", and that it may very well be the case that STEP ends up not being supported here.)

Technically, you are correct. But it also has some side-effects - for example, if you want to cache a move selector, you may find that the hundreds (thousands? millions?) of possible moves either don't fit in memory, or they take ages to generate. JUST_IN_TIME is great, because it will only generate what you'll actually use.

triceo avatar Jul 14 '23 08:07 triceo

This is indeed a chicken-and-egg problem, and I currently can not think of a solution.

The code you suggested has one big problem - it has to trigger the step start, even though the placer iterator may later prove to be empty. In effect, step listeners are triggered when they should not be. It's starting to look like one of those fail-fast situations, but I'm not giving up just yet.

triceo avatar Jul 14 '23 09:07 triceo

Looking at the documentation, we never specifically say that step cache isn't supported with sorted selection.

However, every example we give includes phase cache, which works. In fact, the entire documentation seems to mention step cache only once. One more piece in the puzzle.

triceo avatar Jul 14 '23 09:07 triceo

Another user reporting a similar problem: https://stackoverflow.com/questions/77088034/optaplanner-issue-not-able-to-replicate-a-java-working-method-in-benchmarking

This time, it is in probabilistic selector, which does not extend caching selector, but does caching nonetheless. There are obviously places besides the caching selector hierarchy where caching happens; useful to know.

triceo avatar Sep 12 '23 11:09 triceo