opencode icon indicating copy to clipboard operation
opencode copied to clipboard

fix: optimize Ripgrep.tree() for large repositories (109x faster)

Open Karavil opened this issue 2 weeks ago • 0 comments

Problem

Ripgrep.tree() was extremely slow for large repositories (8+ seconds for 42k files).

Root cause: The tree building algorithm used linear search to find child nodes:

// BEFORE: O(n) linear search through children array
let existing = current.children.find((x) => x.path.at(-1) === part)

For each file path segment, this searched through all existing children. With 42k files and average depth of 5, this becomes O(n × d × c) where:

  • n = number of files (42,680)
  • d = average path depth (~5)
  • c = average children per node
File 1:     find() in 0 children     → O(1)
File 100:   find() in ~50 children   → O(50) per segment
File 1000:  find() in ~200 children  → O(200) per segment
File 42000: find() in ~500 children  → O(500) per segment
           ↑
      This compounds across 42k files × 5 segments each

This quadratic-like behavior caused the massive slowdown.

Solution

Rewrote the tree algorithm with a FileNode class that:

  1. O(1) child lookups - Uses Map<string, FileNode> instead of linear search
  2. Lazy sorting - Only sorts nodes visited during BFS, not the entire tree
  3. Parent references - Enables bottom-up selection propagation for cleaner code
// AFTER: O(1) Map lookup
let node = this.lookup.get(name)

Benchmarks

┌───────────────────┬─────────┬──────────┬─────────┬──────────┬────────────┐
│ Repository        │ Files   │ Before   │ After   │ Speedup  │ Reduction  │
├───────────────────┼─────────┼──────────┼─────────┼──────────┼────────────┤
│ opencode          │  2,567  │   18ms   │   14ms  │    1.3x  │   -22%     │
│ goblins (ours)    │ 42,680  │ 8,625ms  │   79ms  │  109.2x  │   -99%     │
└───────────────────┴─────────┴──────────┴─────────┴──────────┴────────────┘

Output Verification

Tree output is byte-for-byte identical before and after the change. Verified by diff comparison on both repositories. The optimization only changes the internal data structure lookup mechanism, not the algorithm logic or output format.

Test plan

  • [x] Typecheck passes
  • [x] Added 8 unit tests for tree output
  • [x] Benchmarked on opencode repo (2.5k files)
  • [x] Benchmarked on goblins repo (42k files)
  • [x] Verified output is identical to original

🤖 Generated with Claude Code

Karavil avatar Dec 31 '25 05:12 Karavil