scala3 Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic

The main issue is that if a type is provisional, its denotation, signature, and other values will not be cached. As a result, using widen on provisional types will repeatedly recompute these values.

To enhance cache reuse, I introduced the concept of a provisional state, which maps all provisional type components (TypeRef, LazyRef, and TypeVar) to their information. For example, by comparing two provisional states, we can determine if a TypeVar is instantiated to a type from last time, even if the type is still provisional. For non-provisional types, the states are empty.

This PR includes two performance improvements:

Rewrite provisional testing: We can now collect all provisional type components during checking. Provisional states are added for cached values (denot, signature, etc.), and we reuse the cache value if the corresponding state has not changed.
Refactor TermRef widening logic: I inlined the logic here to avoid computing provisional state twice.
~Add a local cache for widening results inside resolveOverloaded: Given that the denotation will not change in this part, we can safely cache the widening result.~ Since resolveOverloaded uses constrainResult and normalizedCompatible, which could constrain TypeVars, the denotation may change as well, so we should not cache the result here.

The compilation time in #20217 can be reduced to Scala 2 levels. We can test the results by publishing locally and compiling the project from #20217. Here are the (average) results on my laptop:

2.13.14: 12s
3.4.2: 67s
3.6.0-RC1-bin-20240724-f0b3a2b-NIGHTLY: 62s
This PR: 16s

Jul 26 '24 02:07 noti0na1

I am still not satisfied with the performance of the denotation computations. It appears that this is actually a performance regression introduced by #18092. The nightly builds 3.3.2-RC1-bin-20230707-7ede150-NIGHTLY and 3.3.2-RC1-bin-20230710-ed319e8-NIGHTLY confirm this issue.

Since the merge denotations require signature and #18092 disables the signature cache for provisional method types, computing an overloading denotation has become slower following that PR.

If I remove && !isProvisional from MethodOrPoly, the compiling time of the project from #20217 can be improved as well. However, while all tests pass, I believe this approach is incorrect given the semantics of isProvisional.

CC @smarter

Jul 29 '24 11:07 noti0na1

I'm trying another approach where a cache (denotation or signature, etc.) would be valid as long as the provisional state is not changed.

The provisional state of a type would store any component type that might be provisional in a map to track their changes, similar to the reductionContext in MatchType.

Jul 30 '24 14:07 noti0na1

test performance please

Sep 06 '24 12:09 noti0na1

Hey, there seems to be no new regressions found in the Open CB when compared with the latest nightly (3.6.0-RC1-bin-20240905-f285199-NIGHTLY)

Total execution time for each of the builds (summed execution time from all GH Actions runners running in parallel): Nightly: 3d 0h 13m 57s PR 1# run: 3d 3h 7m 7s PR 2# run: 3d 1h 46m 13s

I've taken a look at the nightly from the previous nightly run (3.6.0-RC1-bin-20240902-f774497-NIGHTLY) - it executed in 3d 0h 49m 56s At first glance, it seems to be executing longer for 3h more of accumulated, but to confirm that I'd need to re-run it. It's also possible that it could have been more unlucky by picking the less performant runners. We've run it over 1611 projects, so it makes the average difference of ~7s.

Sep 08 '24 20:09 WojciechMazur

Thanks for the statistics @WojciechMazur! That's good to know. I had a nagging suspiciion that this is probably too expensive in this form, so it seems we need to target it better (or decide it's not worth trying).

But, it's fantastic that we can get these large codebase statistics. This is a gamechanger for future attempts at optimizations!

Sep 08 '24 21:09 odersky

Just a reminder that the execution times from the OpenCB need to be taken with a grain of salt. The actual compilation times might differ because we don't have a stable, reproducible environment. Repeating the same build again has shortened the execution by ~1.5h

Sep 09 '24 07:09 WojciechMazur

test performance please

Sep 16 '24 07:09 mbovel

performance test scheduled: 1 job(s) in queue, 0 running.

Sep 16 '24 07:09 dottybot

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/21278/ to see the changes.

Benchmarks is based on merging with main (ad8c21a16a)

Sep 16 '24 08:09 dottybot

scala3 scala3 copied to clipboard

Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic

scala3
scala3 copied to clipboard