timefold-solver
timefold-solver copied to clipboard
CS: new joiners: intersecting(), disjoint(), containsAll()
Reqs:
- Hashtable index performance/scalability
- Intersecting: Person Ann and Beth have a hobby in common, so they can sit at the same table.
- Disjoint: Talk A and B don't have a topic in common, so they can happen in parallel.
- ContainsAll: Job A has a set of required skills. Person Ann's skillSet contains all of those required skills.
.forEachUniquePair(Job.class,
overlapping(Job::getStartDate, Job::getEndDate),
// TODO Make this hash powered
filtering((job1, job2) -> !Collections.disjoint(
job1.getTagSet(), job2.getTagSet())))
To reproduce: see MaintenanceScheduleConstraintProvider#tagConflict
Does this issue happen to have a planned release (and/or a workaround to enhance performance?)
I currently have a few constraints that look something like
return constraintFactory
.forEach(Subject.class)
.join(Lesson.class,
filtering((subject, lesson) -> lesson.getSubjects().contains(subject))
)......
and, while workable, I am noticing that the filtering(() -> x.contains(y)) construction is one of the slower parts of my constraint calculations.
@niekvanderkooy Thanks for the feedback! Unfortunately, this is not on the short-term roadmap.
As a workaround; have you made sure subjects is a Set? contains() on a List is expensive. (But it may actually be the other way around if the list is very small.)
Ah, Set vs List is a good tip! The Collection (which is currently a List) is generally quite small (In fact, it is often a single element), but I will try switching it for a Set and see if that has an effect!
I'm guessing not; for a single-element list, iteration is not an issue and is likely to perform better than the overhead of hashing/lookups on a typical HashSet. The list starts losing as its size grows, but I wouldn't expect lists of single-digit sizes to be a problem.
For small lists there indeed seems to be little to no difference, while for larger lists the Set definitely gets the upper hand, of course. For size of 1 the List does not seem faster either, it's just the same.
What is interesting is that I did a comparison with a dataset where the Collection size was exactly 1 for all objects, after which I altered my Lesson model not to have a Collection of Subjects but just a single element, so I could a join on equals instead of contains. The difference in score calculation speed is astonishing, in my specific case the score calculation speed became 3x as high (even though only a few constraints are using the contains construction, most are not!). So I'll need to have a look into approaching this another way, I don't want to make so much performance go to waste!
Set vs List speeds up the filtering predicate. It does NOT reduce the number of filtering predicate calls. The RFE would.
For example, given 2000 lessons with a collection of 50 elements, a set can make it at most 50 times faster, but a intersects joiner can make it at most 2000 times faster.
Those are all theoretical numbers, in reality it's lower. Still very much worth it.
We want this feature and we'll work on it when we get to it. That being said, it's currently not in the short term roadmap because other RFEs have a higher priority (more community impact or an Enterprise customer is asking for it).