Reliable FCS benchmarks which run on every release
We need to come up with reliable FCS benchmarks, and run them as part of dotnet/performance, ideally every release (on every signed build for current release branch).
Some ideas for scenarios we want to test:
- VS related (may be better solved with telemetry, rather than benchmarks):
- Cold:
- Time to colouring and breadcrumbs in the newly built projects of different sizes.
- Time to intellisense in an arbitrary document in a newly build project of different sizes.
- Time to build, time to rebuild.
- Time to produce diagnostics on error (type errors, syntax errors, etc).
- Time to full typecheck of project graphs of both connected and independent.
- Hot (re-check):
- Same as above but in a prebuilt project of various sizes.
- Cold:
- General (FCS):
- Static tests:
- Typechecking of projects of various sizes.
- Typechecking of project graphs of both connected and independent.
- Runtime tests (pretty much testing our codegen).
- Time to main, we need to include different "configs" - ngen'd/crossgen'd, w/ and w/o mibc.
- Static tests:
Some metrics we are interested in: time spent in scenarios, allocations (how much do we promote, is there anything ending up in LOH, etc), time spent in GC, etc.
More info and context: https://github.com/dotnet/fsharp/discussions/12526 & https://github.com/dotnet/fsharp/pull/13419
Nice to see this getting traction 👍
FYI I created https://github.com/safesparrow/fsharp-benchmark-generator for automating FCS testing. I was planning to add more examples there and make it more robust, but not much code there at the moment.
I'd be more than happy to help either extend that or work on an equivalent, more official tool. I can also transfer the repository if that'd be of any use.
I think in general a separate repository with benchmarks is better than putting them inside the main repository - similar to the approach taken in dotnet/performance. Not sure if all three types of tests (VS, FCS code analysis, FCS runtime) should live together or not. Maybe they could share a library with utilities (eg. for code/project generation) but be separate pieces of code & infrastructure?
Also I think it would be nice if whatever benchmarking tools are created are available for local runs and not just CI.
I created some parsing and type checking benchmarks on FSharpPlus and FsToolkit.ErrorHandling which take reasonable amount of time but should also hopefully catch most regressions.
https://github.com/dotnet/performance/pull/2592
They can be run locally, with local build of FSC plugged in via project file. I'll think about making that easier somehow.
@0101, given that your PR in dotnet/performance has been merged for 2 weeks now, can we see some results, graphs? Does it run regularly yet?
@kerams it should be running now, and data being collected. Still have to figure out how to get to them.
Ping
Ping
Latest results are from November last year, it seems it doesnt run anything, and no alerts were produced. We're mostly using local machines not run before/after.
No data for other (non-F# types of runs either). Not sure if it was migrated somehwere @DrewScoggins?