cline icon indicating copy to clipboard operation
cline copied to clipboard

The extra work of creating a checkpoint takes too much time on large repositories

Open CyanSalt opened this issue 6 months ago • 2 comments

What happened?

I found that Cline took more than 20 seconds to create each checkpoint in a large private repository. After investigation, I found that at least half of the time was consumed by the traversal operation of renameNestedGitRepos https://github.com/cline/cline/blob/656e3276c6131cf4f2907b46ff758687e751ba44/src/integrations/checkpoints/CheckpointGitOperations.ts#L143-L150

The traversal operation here surprisingly includes files ignored by Git. For front-end code repositories, node_modules will bring a surprising overhead - in my project, removing node_modules will reduce the traversal time by 64%.

I'm not sure if this problem is included in https://github.com/cline/cline/issues/4388, but it doesn't seem to be a very complicated problem.

Steps to reproduce

  1. Create a blank project
  2. Install a lot of dependencies using pnpm
  3. Start a conversation in Cline

Relevant API REQUEST output


Provider/Model

OpenAI Compatible

Operating System

macOS 15.5

System Info

Apple M3 Pro

Cline Version

3.18.0

Additional context

No response

CyanSalt avatar Jun 27 '25 07:06 CyanSalt

@canvrno

celestial-vault avatar Jun 27 '25 16:06 celestial-vault

Hey @CyanSalt, thank you for the insight into this! This is one of several known issues with checkpoints that can occur with large repositories. We're looking into a new checkpoints implementation that will completely eliminate the need for many of these expensive operations and optimize checkpoints to work with repositories of any size, but haven't yet decided if we want to optimize the current system or wait for it's replacement. Since this change would be relatively minor I'm leaning towards leveraging the .gitignore for renameNestedGitRepos traversal, but I'll discuss with the team and see what they think.

Out of curiosity, could you tell us approximately how many files are present in this repo (including node_modules)? I've been using a ~500,000 file count repo for large repo testing. These issues are obviously hardware/environmentally dependent, but the more data we can get on performance issues the better.

canvrno avatar Jun 27 '25 21:06 canvrno

@canvrno I have checked my repo, which contains 251,996 files, which doesn't look any bigger than your test repo. But it's worth noting that due to the way pnpm works, it contains an additional 11,978 symbolic links pointing to other directories under the project. I'm not sure if this has an impact on glob performance.

(In fact, the project actually has only 6,355 files hosted in Git 😊)

CyanSalt avatar Jun 30 '25 12:06 CyanSalt