garden icon indicating copy to clipboard operation
garden copied to clipboard

0.13: [Performance]: Git scan takes a long time on Windows

Open TimBeyer opened this issue 1 year ago • 0 comments

Garden Bonsai (0.13) Performance issue

Problem

On 0.13.23 using Windows on a project containing over 50k files, the performance of the initial git scan is bad and takes around a minute. The same process on Linux is significantly faster, however we should also investigate the performance profile on Linux to have a good reference and ensure both systems are well optimized.

Overall one would expect speeds in the range of several seconds, not up to a minute. The problem is compounded by the scan running twice overall.

Excerpt of execution logs
15:00:57 [verbose] garden version: 0.13.23
15:00:57 Validate 
15:00:57 
15:00:57 i garden                    → Initializing...
15:00:57 Project is configured with `apiVersion: garden.io/v0`, running with backwards compatibility.
15:00:57 Multi-valued project configuration field `dotIgnoreFiles` is deprecated in 0.13 and will be removed in 0.14. Please use single-valued `dotIgnoreFile` instead.
15:00:57 i garden-cloud              → [debug] Initializing Garden Cloud API client.
15:00:57 i garden-cloud              → [debug] Authorizing...
15:00:57 i garden-cloud              → [debug] Starting refresh interval.
15:00:57 [debug] Will run refresh function every 4500 ms.
15:00:58 i garden-cloud              → Connecting project...
15:00:58 √ garden-cloud              → Ready
15:00:58 i git                       → [debug] Scanning project root at C:\git\project
  → Includes: (none)
  → Excludes: .garden\**\*,.git,.gitmodules,.garden\**\*,debug-info*\**,**/.garden/**/*
15:01:52 i git                       → [debug] Found 55353 files in project root C:\git\project
15:01:53 [debug] Found 55353 files in module path, filtering by 1 include and 6 exclude globs
15:01:53 [debug] Found 34 files in module path after glob matching
15:01:53 [debug] Scanned and found 0 actions, 2 workflows and 71 modules
15:01:54 i providers                 → Resolving providers...
15:01:54 i exec                      → Configuring provider...
15:01:54 i templated                 → [debug] Configuring provider...
15:01:54 i container                 → [debug] Configuring provider...
15:01:54 √ exec                      → Provider configured
15:01:54 √ templated                 → [debug] Provider configured
15:01:54 √ container                 → [debug] Provider configured
15:01:54 √ exec                      → Provider status cached
15:01:54 i providers                 → [verbose] resolve provider exec is ready.
15:01:54 √ templated                 → [debug] Provider status cached
15:01:54 i providers                 → [verbose] resolve provider templated is ready.
15:01:54 √ container                 → [debug] Provider status cached
15:01:54 i providers                 → [verbose] resolve provider container is ready.
15:01:54 i kubernetes                → Configuring provider...
15:01:54 √ kubernetes                → Provider configured
15:01:54 √ kubernetes                → Provider status cached
15:01:54 i providers                 → [verbose] resolve provider kubernetes is ready.
15:01:54 √ providers                 → Finished resolving providers (took 0.2 sec)
15:01:54 i providers                 → All provider statuses were cached. Run with --force-refresh to force a refresh of provider statuses.
15:01:54 i graph                     → Resolving actions and modules...
15:01:55 i git                       → [debug] Scanning module example-module root at C:\git\project
  → Includes: (none)
  → Excludes: **/.garden/**/*
15:02:51 i git                       → [debug] Found 55354 files in module example-module root C:\git\project
15:02:51 i graph                     → [debug] Found 4053 files in module path, filtering by 2 include and 1 exclude globs
15:02:51 i graph                     → [debug] Found 2908 files in module path after glob matching
15:02:51 i graph                     → [debug] Found 4053 files in module path, filtering by 2 include and 1 exclude globs
15:02:51 i graph                     → [debug] Found 2907 files in module path after glob matching
...

As can be seen in the logs, the overall execution is dominated by two git scans taking almost a minute each, with the subsequent glob filtering being very performant afterwards.

Suggested solution(s)

We've seen in the past that windows has significant performance penalties when creating many child processes. We should check if the scanning process might be leading to many child processes being spawned. Alternatively we should see if there's any other I/O related things that may be causing problems on Windows machines.

Your environment

  • OS: Windows
  • Version: 0.13.23

TimBeyer avatar Jan 04 '24 10:01 TimBeyer