fsharp (Eventually) fix #17501

The approach I want to take is to cache the results of type subsuming for two given types (in TypeFeasiblySubsumesType & co).

For now just make code a bit more readable, trying to figure out how should we approach the caching itself, since TType itself is not have and equality for it (by design), since we can have non-nominal generic types, where TType_var's solution can be (and is) mutable, and can change with time.

Update, perf: With cache (PR): 5.8s Without cache (main): 130.9s On a real project with issue (OpenTK, which uses NET 8 BCL and its types extensively), compiled successfully.

One concern is that there can be a hash collision eventually, which will trip this logic. But I will play with additional value comparisons if we hit the cache.

Sep 06 '24 11:09 vzarytovskii

:heavy_exclamation_mark: Release notes required

:white_check_mark: Found changes and release notes in following paths:

Change path Release notes path Description

src/Compiler docs/release-notes/.FSharp.Compiler.Service/9.0.200.md

LanguageFeatures.fsi docs/release-notes/.Language/preview.md

Change path	Release notes path	Description
`src/Compiler`	docs/release-notes/.FSharp.Compiler.Service/9.0.200.md
`LanguageFeatures.fsi`	docs/release-notes/.Language/preview.md

Sep 06 '24 11:09 github-actions[bot]

It helps compilation. By a lot (130s -> 5s).

A bunch of blocking issues I'm not yet sure how to address:

If in interactive session, we use shared amap, thus there will potentially be a lot of false-positive cache hits.
In tooling scenario, we potentially have a lot of false-positive cache hits:

type A implements interface IA, we cache it. interface IA is removed from the type A, it's still in the cache. Any check will pass for "is A an IA".

For 1st, we can technically clean the cache on each FSI submission.

For 2nd - I am not sure what to do, since we fill in the cache on-demand. Pre-filling it will be costly and will slowdown every single compilation and check. Checking interfaces on any cache hit will neglect this optimization.

One of the solutions might be to only enable it for compilation.

Sep 19 '24 11:09 vzarytovskii

So, doing smarter caching now, as well as enabled it only one-off compilation, until we have proper caches in compiler.

Sep 25 '24 17:09 vzarytovskii

@T-Gro @0101 @psfinaki @abonie @KevinRansom please, give it a glance

I don't think there's a good way of testing it automatically and deterministically.

Sep 25 '24 18:09 vzarytovskii

My impression is that this change is too big to go without additional testing and the potential benefits are not worth the risk being introduced. Moreover, the PR might not be easily reversible.

Especially in the light of the recent regressions in this area, I'd rather propose to take a breath here and hold a discussion on how we can actually test this. I think we would also benefit from coming up with a list of performance testing scenarios here to ensure that speeding one thing doesn't slow down something else.

This is a non-blocking opinion :shrug:

Sep 26 '24 11:09 psfinaki

My impression is that this change is too big to go without additional testing and the potential benefits are not worth the risk being introduced. Moreover, the PR might not be easily reversible.

Maybe we could also split the code moving into a separate PR, so it'd be easier to check/revert the actual changes?

Sep 26 '24 11:09 auduchinok

My impression is that this change is too big to go without additional testing and the potential benefits are not worth the risk being introduced. Moreover, the PR might not be easily reversible.

Maybe we could also split the code moving into a separate PR, so it'd be easier to check/revert the actual changes?

It is part of change. If it to be reverted, it will need to be reverted together.

Sep 26 '24 11:09 vzarytovskii

My impression is that this change is too big to go without additional testing and the potential benefits are not worth the risk being introduced. Moreover, the PR might not be easily reversible.

Huh? How is adding a cache (under a language flag) a big change? It's literary only thing it does.

Especially in the light of the recent regressions in this area, I'd rather propose to take a breath here and hold a discussion on how we can actually test this. I think we would also benefit from coming up with a list of performance testing scenarios here to ensure that speeding one thing doesn't slow down something else.

This is a non-blocking opinion :shrug:

Alright, as there's no agreement of the fix, feel free to propose a plan and reopen it when there's a good understanding of testing and perf scenarios.

It was more of a hackathon project/free time issue, and I don't have any more time to spare on testing/additional thinking on it.

Sep 26 '24 11:09 vzarytovskii

I agree this is not a large change. I think performance is one of most important elements of user experience. Every improvement here is incredibly valuable.

As for testing this, one idea is to enable this for all compilation scenarios, not just one off. More chance to spot regressions if the whole existing test suite uses this code, not just a subset.

Sep 26 '24 13:09 majocha

I agree this is not a large change.

I think performance is one of most important elements of user experience. Every improvement here is incredibly valuable.

As for testing this, one idea is to enable this for all compilation scenarios, not just one off. More chance to spot regressions if the whole existing test suite uses this code, not just a subset.

Can't enable for one reason - we don't have a good solution for the caching in compiler, which has eviction and statistics built in. In the service scenarios, dictionary will just grow forever, since (almost) any change to the type will introduce a new instance of TType and will cache it.

I think we build product compiler and some of suites with preview version, so it has some coverage.

I also tested it on the original OpenTK solution

Sep 26 '24 14:09 vzarytovskii

I added a few remarks for myself as I plan to (later) finish this impl.

Sep 30 '24 13:09 T-Gro

I addressed a couple of them. I still think that it's tested enough (as good as we can), plus is hidden under flags (language and tooling), and think that we can merged it as is, it's better to test it this early than try and plan "better" solution. But then again, if there's any pushback, happy to hear a better plan/solution to it.

Sep 30 '24 13:09 vzarytovskii

This is ready @dotnet/fsharp-team-msft

Oct 31 '24 14:10 vzarytovskii

It would be interesting to see if there's any perf gain when comparing to .net6/.net7 ? Or does the optimized code not even exists there?

Nov 07 '24 21:11 gusty

It would be interesting to see if there's any perf gain when comparing to .net6/.net7 ?

Or does the optimized code not even exists there?

Will generally be minimal, because bcl there didn't have such large type hierarchies.

Nov 08 '24 00:11 vzarytovskii

An important note: this will only be in preview language version and only in compilation (not tooling), to enable it in tooling, there needs to be a good caching solution in the compiler (as opposed to just concurrent dictionary), otherwise, in tooling scenarios, it will grow indefinitely.

Nov 20 '24 11:11 vzarytovskii