timefold-solver CS: new joiners: intersecting(), disjoint(), containsAll()

Reqs:

Hashtable index performance/scalability
Intersecting: Person Ann and Beth have a hobby in common, so they can sit at the same table.
Disjoint: Talk A and B don't have a topic in common, so they can happen in parallel.
ContainsAll: Job A has a set of required skills. Person Ann's skillSet contains all of those required skills.

                .forEachUniquePair(Job.class,
                        overlapping(Job::getStartDate, Job::getEndDate),
                        // TODO Make this hash powered
                        filtering((job1, job2) -> !Collections.disjoint(
                                job1.getTagSet(), job2.getTagSet())))

To reproduce: see MaintenanceScheduleConstraintProvider#tagConflict

Apr 24 '23 11:04 ge0ffrey

Does this issue happen to have a planned release (and/or a workaround to enhance performance?)

I currently have a few constraints that look something like

        return constraintFactory
                .forEach(Subject.class)
                .join(Lesson.class,
                        filtering((subject, lesson) -> lesson.getSubjects().contains(subject))
                )......

and, while workable, I am noticing that the filtering(() -> x.contains(y)) construction is one of the slower parts of my constraint calculations.

Dec 12 '23 15:12 niekvanderkooy

@niekvanderkooy Thanks for the feedback! Unfortunately, this is not on the short-term roadmap.

As a workaround; have you made sure subjects is a Set? contains() on a List is expensive. (But it may actually be the other way around if the list is very small.)

Dec 12 '23 15:12 triceo

Ah, Set vs List is a good tip! The Collection (which is currently a List) is generally quite small (In fact, it is often a single element), but I will try switching it for a Set and see if that has an effect!

Dec 12 '23 16:12 niekvanderkooy

I'm guessing not; for a single-element list, iteration is not an issue and is likely to perform better than the overhead of hashing/lookups on a typical HashSet. The list starts losing as its size grows, but I wouldn't expect lists of single-digit sizes to be a problem.

Dec 12 '23 16:12 triceo

For small lists there indeed seems to be little to no difference, while for larger lists the Set definitely gets the upper hand, of course. For size of 1 the List does not seem faster either, it's just the same.

What is interesting is that I did a comparison with a dataset where the Collection size was exactly 1 for all objects, after which I altered my Lesson model not to have a Collection of Subjects but just a single element, so I could a join on equals instead of contains. The difference in score calculation speed is astonishing, in my specific case the score calculation speed became 3x as high (even though only a few constraints are using the contains construction, most are not!). So I'll need to have a look into approaching this another way, I don't want to make so much performance go to waste!

Dec 13 '23 19:12 niekvanderkooy

Set vs List speeds up the filtering predicate. It does NOT reduce the number of filtering predicate calls. The RFE would.

For example, given 2000 lessons with a collection of 50 elements, a set can make it at most 50 times faster, but a intersects joiner can make it at most 2000 times faster.

Those are all theoretical numbers, in reality it's lower. Still very much worth it.

We want this feature and we'll work on it when we get to it. That being said, it's currently not in the short term roadmap because other RFEs have a higher priority (more community impact or an Enterprise customer is asking for it).

Dec 30 '23 06:12 ge0ffrey

timefold-solver timefold-solver copied to clipboard

CS: new joiners: intersecting(), disjoint(), containsAll()

timefold-solver
timefold-solver copied to clipboard