cargo-watch icon indicating copy to clipboard operation
cargo-watch copied to clipboard

Slow startup in monorepo

Open culli opened this issue 2 years ago • 6 comments

I am trying out cargo watch in a monorepo which is made of node, rust, etc. and it takes a long time (~50 seconds) to start cargo watch -x test. Turning on --debug it looks like it is watching the whole repo (node_modules, etc). I've tried --skip-local-deps and -w . but it still seems to be watching much too widely. Any other options I can try?

I'm running it straight on a mac m1, no docker or anything, happens in both iterm2 and jetbrains (goland) embedded terminal.

culli avatar Aug 15 '23 22:08 culli

I am working on this this week/month (time permitting) actually! The immediate issue seems like a regression in ignores but there's other deeper issues that I'm working to eliminate.

As a workaround for right now you can try using -i '**/node_modules/**' and if that doesn't work either watch the src directory only (or whatever's useful) or as a last resort downgrade to 8.1.2

passcod avatar Aug 16 '23 03:08 passcod

Sorry to say that didn't seem to help. I also added some other big directories like **/dist/**. Looking more at the debug, it's still loading a lot of .gitignore from down in node_module, for what that's worth.

What does help for now is --no-vcs-ignores, then it starts right up. I might have to tweak the ignores a bit, but it's working!

Also possibly relevant is that the main cargo.toml has [workspace] with several members.

Some output (before using --no-vcs-ignores):

cargo_watch::options: 2023-08-16T09:07:21.538-06:00 - DEBUG - All ignores: ["*/.DS_Store", "*.sw?", "*.sw?x", "#*#", ".#*", ".*.kate-swp", "*/.hg/**", "*/.git/**", "*/.svn/**", "*.db", "*.db-*", "*/*.db-journal/**", "*/target/**", "**/node_modules/**", "**/dist/**"]
...
watchexec::gitignore: 2023-08-16T09:07:21.707-06:00 - DEBUG - Looking in "/Users/jimcullison/projects/m/u/s/statsd" for a .git directory
watchexec::gitignore: 2023-08-16T09:07:21.707-06:00 - DEBUG - Looking in "/Users/jimcullison/projects/m/u/s" for a .git directory
watchexec::gitignore: 2023-08-16T09:07:21.707-06:00 - DEBUG - Looking in "/Users/jimcullison/projects/m/u" for a .git directory
watchexec::gitignore: 2023-08-16T09:07:21.707-06:00 - DEBUG - Looking in "/Users/jimcullison/projects/m" for a .git directory
watchexec::gitignore: 2023-08-16T09:07:21.707-06:00 - DEBUG - Found the top level git directory: "/Users/jimcullison/projects/m
watchexec::gitignore: 2023-08-16T09:07:22.049-06:00 - DEBUG - Loaded "/Users/jimcullison/projects/m/x/node_modules/nopt/.gitignore"

Let me know if anything else in the output might be interesting.

culli avatar Aug 16 '23 15:08 culli

Ahhh yep, different regression, also on the todo list but a bit further down. Glad you've got a workaround tho!

passcod avatar Aug 16 '23 21:08 passcod

I'm running into the same issue. Passing --debug, it looks like cargo watch is repeatedly parsing the .gitignore files for each crate in the workspace? Considering we have over 30 crates in our workspace, cargo watch startup is taking multiple minutes.

A sample of the logs:

watchexec::gitignore: 2024-04-04T22:25:29.798-07:00 - DEBUG - Looking in "/Users/fang/lexe/dev/lexe/public/run-sgx" for a .git directory
watchexec::gitignore: 2024-04-04T22:25:29.798-07:00 - DEBUG - Looking in "/Users/fang/lexe/dev/lexe/public" for a .git directory
watchexec::gitignore: 2024-04-04T22:25:29.798-07:00 - DEBUG - Looking in "/Users/fang/lexe/dev/lexe" for a .git directory
watchexec::gitignore: 2024-04-04T22:25:29.798-07:00 - DEBUG - Found the top level git directory: "/Users/fang/lexe/dev/lexe"
globset: 2024-04-04T22:25:31.492-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/Flutter/ephemeral/**", re: "(?-u)^(?:/?|.*/)Flutter/ephemeral(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('F'), Literal('l'), Literal('u'), Literal('t'), Literal('t'), Literal('e'), Literal('r'), Literal('/'), Literal('e'), Literal('p'), Literal('h'), Literal('e'), Literal('m'), Literal('e'), Literal('r'), Literal('a'), Literal('l'), RecursiveSuffix]) }
globset: 2024-04-04T22:25:31.492-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/Pods/**", re: "(?-u)^(?:/?|.*/)Pods(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('P'), Literal('o'), Literal('d'), Literal('s'), RecursiveSuffix]) }
globset: 2024-04-04T22:25:31.492-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/dgph/**", re: "(?-u)^(?:/?|.*/)dgph(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('d'), Literal('g'), Literal('p'), Literal('h'), RecursiveSuffix]) }
globset: 2024-04-04T22:25:31.492-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/xcuserdata/**", re: "(?-u)^(?:/?|.*/)xcuserdata(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('x'), Literal('c'), Literal('u'), Literal('s'), Literal('e'), Literal('r'), Literal('d'), Literal('a'), Literal('t'), Literal('a'), RecursiveSuffix]) }

...

# Eventually gets to another crate, then repeats the whole glob -> regex process again:

globset: 2024-04-04T22:25:31.561-07:00 - DEBUG - built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 14 regexes
watchexec::gitignore: 2024-04-04T22:25:31.561-07:00 - DEBUG - Loaded "/Users/fang/lexe/dev/lexe/.gitignore"
watchexec::gitignore: 2024-04-04T22:25:31.576-07:00 - DEBUG - Looking in "/Users/fang/lexe/dev/lexe/repotools" for a .git directory
watchexec::gitignore: 2024-04-04T22:25:31.576-07:00 - DEBUG - Looking in "/Users/fang/lexe/dev/lexe" for a .git directory
watchexec::gitignore: 2024-04-04T22:25:31.576-07:00 - DEBUG - Found the top level git directory: "/Users/fang/lexe/dev/lexe"
globset: 2024-04-04T22:21:51.834-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/Flutter/ephemeral/**", re: "(?-u)^(?:/?|.*/)Flutter/ephemeral(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('F'), Literal('l'), Literal('u'), Literal('t'), Literal('t'), Literal('e'), Literal('r'), Literal('/'), Literal('e'), Literal('p'), Literal('h'), Literal('e'), Literal('m'), Literal('e'), Literal('r'), Literal('a'), Literal('l'), RecursiveSuffix]) }
globset: 2024-04-04T22:21:51.834-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/Pods/**", re: "(?-u)^(?:/?|.*/)Pods(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('P'), Literal('o'), Literal('d'), Literal('s'), RecursiveSuffix]) }
globset: 2024-04-04T22:21:51.834-07:00 - DEBUG - glob converted to regex: Glob { glob: "**/dgph/**", re: "(?-u)^(?:/?|.*/)dgph(?:/?|/.*)$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('d'), Literal('g'), Literal('p'), Literal('h'), RecursiveSuffix]) }

...

The --no-vcs-ignores workaround isn't too practical for us as we have about 35 lines in our .gitignore.

Any progress on this is appreciated. 🙏

MaxFangX avatar Apr 05 '24 05:04 MaxFangX

Another thing I noticed while trying to debug this myself (before finding the workaround here), we seem to be opening files in the ignored directory even though they will never cause an execution.

 sudo fs_usage | rg cargo-watch | head -n 5000

Prints tons and tons of:

10:08:45  open              .../node_modules/...some file...    0.000020   cargo-watch
10:08:45  fstatfs64                                                                                          0.000001   cargo-watch
10:08:45  getdirentries64                                                                                    0.000011   cargo-watch
10:08:45  open              .../.git/...some file...    0.000020   cargo-watch
10:08:45  fstatfs64                                                                                          0.000001   cargo-watch
10:08:45  getdirentries64                                                                                    0.000011   cargo-watch
10:08:45  open              .../target/...some file...    0.000020   cargo-watch
10:08:45  fstatfs64                                                                                          0.000001   cargo-watch
10:08:45  getdirentries64                                                                                    0.000011   cargo-watch

Despite all of the above folders being ignored by multiple criteria.

ag-mathieulj avatar May 30 '24 14:05 ag-mathieulj

Yeah, the core issue here and still in watchexec is that the notify library does the recursion for the actual watching, and it's not aware of ignores/filtering, so it goes way beyond where it should look.

Watchexec has a work item pending (I've been on a break/holiday, will start work on it in 2024Q3) to do the recursion in the watchexec library, this time with awareness of ignores. That should solve the remaining startup performance issues, at which point I'll get cargo-watch over too.

passcod avatar May 31 '24 03:05 passcod