format icon indicating copy to clipboard operation
format copied to clipboard

dotnet-format is very slow on larger repos

Open ghost opened this issue 2 years ago β€’ 21 comments

on our repository (https://github.com/xSke/PluralKit), dotnet-format will take between 4 and 8 seconds to run, even when no changes need to be applied.

I saw #757 but I'm not sure if it's related, thus the new issue.

here's a -v diag logging of a particularly bad run:

$ dotnet-format -v diag
  The dotnet format version is '5.1.225507+756d5a1c121be3e57e924788af64aa5607dc24e1'.
  The dotnet runtime version is '5.0.9'.
  The dotnet CLI version is '5.0.400'.
  Using MSBuild.exe located in '/home/spiral/.dotnet/sdk/5.0.400/'.
  Formatting code files in workspace '/home/spiral/sources/pluralkit/PluralKit.sln'.
  Loading workspace.
  Project PluralKit.Bot is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Bot is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Bot/obj/Debug/net5.0/PluralKit.Bot.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Bot is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Core/obj/Debug/net5.0/PluralKit.Core.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/sources/pluralkit/PluralKit.API/obj/Debug/net5.0/PluralKit.API.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Tests/obj/Debug/net5.0/PluralKit.Tests.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/sources/pluralkit/Myriad/obj/Debug/net5.0/Myriad.GeneratedMSBuildEditorConfig.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Complete in 4845ms.
  Determining formattable files.
  Complete in 1144ms.
  Running formatters.
  Complete in 1146ms.
  Formatted 0 of 309 files.
  Format complete in 7138ms.

ghost avatar Oct 12 '21 07:10 ghost

I have been investigating dotnet format slowness in our large solution. Here is what I've found out so far:

  • dotnet format whitespace --folder is much faster assuming you are using .Net 6 (with the new version of dotnet format) and are only interested in whitespace formatting. If you need style and analyzers, things get much slower due to compilation.
  • Running dotnet format under a specific project instead of the whole solution helps. For example, dotnet format Solution.sln --include SomeProject/SomeFile.cs is much slower than dotnet format SomeProject/SomeProject.csproj --include SomeProject/SomeFile.cs for some reason. Of course this workaround is only useful if you are doing a git diff to format only affected files in a branch. I wrote a simple bash script to group modified files by project and run dotnet format concurrently for each affected project with --include <path> option. It helped reduce the total runtime from more than 2 mins to under 1 min in our solution assuming a couple of files are modified under a few different projects. We do need style and analyzers so can't use the fast whitespace command with the --folder option.

I suspect redundant compilation (due to referenced projects getting built/analyzed multiple times) is the root cause for the slowness but I am not sure yet. Needs more debugging

erdalsivri avatar Jan 04 '22 22:01 erdalsivri

CSharpSimplifyTypeNamesDiagnosticAnalyzer is responsible for about 20 seconds of runtime for one of our projects. This is what I got after running analyzers separately with extra logging:

  [01/05/2022 02:37:26] Running 7 analyzers on Kaggle.XXX
  [01/05/2022 02:37:26] Running analyzer MisplacedUsingDirectivesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:27] Finished running analyzer MisplacedUsingDirectivesDiagnosticAnalyzer on Kaggle.XXX in 914ms
  [01/05/2022 02:37:27] Running analyzer CSharpUnboundIdentifiersDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:28] Finished running analyzer CSharpUnboundIdentifiersDiagnosticAnalyzer on Kaggle.XXX in 653ms
  [01/05/2022 02:37:28] Running analyzer CSharpUseExplicitTypeDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:29] Finished running analyzer CSharpUseExplicitTypeDiagnosticAnalyzer on Kaggle.XXX in 990ms
  [01/05/2022 02:37:29] Running analyzer CSharpNamingStyleDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:30] Finished running analyzer CSharpNamingStyleDiagnosticAnalyzer on Kaggle.XXX in 868ms
  [01/05/2022 02:37:30] Running analyzer CSharpAddBracesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:31] Finished running analyzer CSharpAddBracesDiagnosticAnalyzer on Kaggle.XXX in 764ms
  [01/05/2022 02:37:31] Running analyzer CSharpSimplifyTypeNamesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:50] Finished running analyzer CSharpSimplifyTypeNamesDiagnosticAnalyzer on Kaggle.XXX in 19688ms
  [01/05/2022 02:37:50] Running analyzer CSharpPreferFrameworkTypeDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:51] Finished running analyzer CSharpPreferFrameworkTypeDiagnosticAnalyzer on Kaggle.XXX in 477ms
  [01/05/2022 02:37:51] Finished running 7 analyzers on Kaggle.XXX in 24407ms

Turns out there are known performance issues with this analyzer: https://github.com/dotnet/roslyn/issues/18950

The runtime seems to be proportional to number of files (or number of lines of code) in the project. We have several projects with over 300 files and this analyzer takes at least 5 seconds to run for each of these projects.

erdalsivri avatar Jan 05 '22 03:01 erdalsivri

Another thing I noticed is that NamingStyleDiagnosticAnalyzer always runs for all projects even if they are not included with --include. For example, dotnet format --include SomeProject/SomeFile.cs will run the NamingStyleDiagnosticAnalyzer in SomeOtherProject and just discard the result as no files are specified under SomeOtherProject.

Filtering out the analyzer based on formattablePaths before the following line fixes the issue: https://github.com/dotnet/format/blob/e473b1d1eb3d0a932501fad472219dca4d9c6576/src/Analyzers/AnalyzerFormatter.cs#L303

GetSeverityAsync does have a formattablePaths check but that's after the line above so NamingStyleDiagnosticAnalyzer is always included.

This fixed's reduced the runtime of dotnet format --include SomeProject/SomeFile.cs by 50% in our solution (from more than 2 minutes to 1 minute). I will prepare a PR

erdalsivri avatar Jan 05 '22 03:01 erdalsivri

Here is the PR to fix the issue with redundant runs of NamingStyleDiagnosticAnalyzer: https://github.com/dotnet/format/pull/1485

erdalsivri avatar Jan 05 '22 18:01 erdalsivri

  • I wrote a simple bash script to group modified files by project and run dotnet format concurrently for each affected project with --include <path> option.

Hey @erdalsivri , could you perhaps share this script? Thanks!

sander1095 avatar May 18 '22 10:05 sander1095

I think it's a good idea to link this to #1479 because both need to be fix to make this tool actually usable in pre commit hooks

sander1095 avatar May 19 '22 06:05 sander1095

Just wanted to share some comments from #1479

Two things go into why Style and Analyzer operations are slower.

First, they require that we have correct semantic information about your code. To do this we rely on MSBuild performing a design-time build where we can know the exact set of code files as well as what references to include with each project. This is equivalent to what Visual Studio or OmniSharp does when opening a project/solution. It is less costly than a full compile, but it can take anywhere from seconds to minutes depending on the size of your solution.

Second, we have to actually run the analyzers and apply code fixes. Running analyzers requires additional in-memory compilations. The results of one analysis become invalid as we apply code fixes from earlier analysis.

We have discussed alternate ways to speed up populating our workspace. Ultimately MSBuild will be involved at some point because .NET builds are so highly customizable the only way not to be wrong is to have MSBuild do the work.

  1. We could use a cache of MSBuild .binlog (or other format) containing the code files and dependency information we need. At which point this becomes a staleness problem. If dependencies aren't being added/removed, then this cache may remain a viable source of information for a long period.
  2. We could develop a Project Server of sorts. This service would have an initial slow startup that you experience today but, while it remains running, future invocations wouldn't have to pay the MSBuild cost.

We are not committing to either approach today, but know that this a problem that we are thinking on.

JoeRobich avatar May 19 '22 22:05 JoeRobich

@JoeRobich could you please share what is the current state of this effort?

I just ran dotnet format on a unit tests project with 10 files which is a part of a large repository.

Loading workspace took ridiculous amount of time:

  Complete in 1045066ms.
  Determining formattable files.

What's puzzling is that it seems to figure out via MSBuild the whole dependency chain of projects to figure out their editorconfig settings and then simply discards them as they are not in scope.

  • What is the actual information that formatter needs from the MSBuild?
  • Can we have a lightweight mode where it just walks up the directories searching for .editorconfig files or just using user-specified .editorconfig without loading the whole project in memory?
  • If .NET team is serious about developers adopting dotnet-format as a default out-of-the-box formatter - this needs to be prioritized, funded and fixed.

Happy to share my binlog if it helps.

shtratos avatar Apr 26 '23 09:04 shtratos

Not to mention that dotnet format process consumes 25GB of RAM while running.

shtratos avatar Apr 26 '23 09:04 shtratos

I don't think any teams from Microsoft are using this, because if they were they would find it unacceptable.

niemyjski avatar Feb 02 '24 21:02 niemyjski

I can confirm for my relatively small project dotnet format whitespace runs 2x as fast as dotnet format. Sadly, it still runs in 4 seconds even with only whitespace, which is still a noticeable slow down in development flow compared to something like Ruff format from the Python ecosystem which runs in imperceivable time.

I had the following set up a File Watcher in Rider:

dotnet format whitespace $SolutionPath$ --include $FileRelativePath$

I was hoping that having it only format the single file I was working on would speed things up, but as mentioned above it seems like a full project build/analysis of some kind is involved.

johnthagen avatar Mar 27 '24 11:03 johnthagen

Hi, We are facing the same issue on running dotnet format on a specific project containing dependencies which are in the workspace. The command seems to "load" everything (all the dependency tree) and take hours to finish 😴 It outputs Duplicate source file xxx in project yyy for all dependencies then Project xxx is using configuration from dir for same

Is there a reason for the command to "load" the dependencies even with --no-restore and launched inside a project folder?

As a workaround, we tried :

  • --include only files as describe in the readme ❌
  • specify the csproj file as param of the command ❌
  • --folder parameter ❌

But in the end I think we will just create a fake ".csproj" file without dependencies to force formatting only in the directory where the command is run 🀷🏻it seems to work and only lint files in the directories under.

ArnaudBDev avatar Apr 04 '24 14:04 ArnaudBDev