opencode icon indicating copy to clipboard operation
opencode copied to clipboard

Bash Fork Issues - Investigation Included

Open burgercrisis opened this issue 2 weeks ago • 1 comments

Description

OpenCode seems to have bash fork issues at the moment. I run into them a lot, only on opencode. I ran preliminary investigation via OpenCode subagents after experiencing the problem so that the agent had an immediate record of the failures to use; I also gave it access to the VSCode repo to investigate what the two apps are doing differently in regards to this. Below is an MD from OpenCode on its findings including several issues it identified as unique to OpenCode with solutions which should reflect translations of the VScode fixes for OpenCode. I have not had time to try implementing them myself yet however, but I believe this would be a solid start if not the actual solution.

Root Cause #1: Per-Command Process Spawning

Problem: OpenCode spawns a new shell process for every bash command execution.

Current Implementation:

// packages/opencode/src/tool/bash.ts:198-206
const proc = spawn(params.command, {
  shell,        // New shell EVERY command
  cwd,
  env: { ...process.env },
  stdio: ["ignore", "pipe", "pipe"],
  detached: process.platform !== "win32",
})

Impact: 100 commands = 100+ shell processes, each spawn involves fork/exec overhead.


Fixes for Root Cause #1

Fix 1A: Persistent Shell Pool (PTY-based)

Implementation Location: packages/opencode/src/shell/pool.ts (NEW)

Architecture:

┌─────────────────────────────────────────────────┐
│              OpenCode Session                    │
├─────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────┐│
│  │            Shell Pool (1-3 PTYs)            ││
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐      ││
│  │  │   PTY   │ │   PTY   │ │   PTY   │      ││
│  │  │  Session│ │  Session│ │  Session│      ││
│  │  └────┬────┘ └────┬────┘ └────┬────┘      ││
│  │       │           │           │            ││
│  │       └───────────┴───────────┘            ││
│  │                   │                        ││
│  │           Shell Integration               ││
│  │           (Command Completion)            ││
│  └─────────────────────────────────────────────┘│
└─────────────────────────────────────────────────┘

Implementation:

export class ShellPool {
  private pool: Map<string, ShellSession> = new Map()
  private maxSize: number = 3
  private shell: string
  
  async execute(command: ExecuteOptions): Promise<CommandResult> {
    const session = await this.acquire()
    
    if (command.workdir && command.workdir !== session.cwd) {
      session.pty.write(`cd ${escapeShellArg(command.workdir)}\n`)
    }
    
    session.pty.write(command.command + '\n')
    const output = await this.waitForCompletion(session)
    this.release(session)
    return output
  }
  
  private async acquire(): Promise<ShellSession> {
    for (const session of this.pool.values()) {
      if (session.state === 'idle') {
        return session
      }
    }
    if (this.pool.size < this.maxSize) {
      return await this.createSession()
    }
    return new Promise((resolve) => {
      this.queue.push(resolve)
    })
  }
  
  private async createSession(): Promise<ShellSession> {
    const id = generateId()
    const pty = pty.spawn(this.shell, ShellIntegration.getArgs(this.shell), {
      name: 'xterm-256color',
      cols: 80,
      rows: 24,
      cwd: Instance.directory,
      env: { ...process.env, ...ShellIntegration.getEnv(this.shell) },
    })
    
    const session: ShellSession = {
      id,
      pty,
      cwd: Instance.directory,
      env: { ...process.env },
      lastUsed: Date.now(),
      state: 'idle',
    }
    
    this.setupCompletionHandler(session)
    this.pool.set(id, session)
    return session
  }
  
  private async waitForCompletion(session: ShellSession): Promise<CommandResult> {
    return new Promise((resolve, reject) => {
      const timeout = setTimeout(() => {
        session.pty.off('data', handler)
        reject(new Error('Command timeout'))
      }, 30000)
      
      const handler = (data: string) => {
        const events = ShellIntegration.parseSequences(data)
        for (const event of events) {
          if (event.type === 'command_finish') {
            session.pty.off('data', handler)
            clearTimeout(timeout)
            resolve({ output: data, exitCode: event.data.exitCode })
          }
        }
      }
      session.pty.on('data', handler)
    })
  }
  
  private release(session: ShellSession): void {
    session.state = 'idle'
    if (this.queue.length > 0) {
      const next = this.queue.shift()!
      next(session)
    }
  }
}

Expected Outcome: 97% reduction in process count (100+ → 1-3 per session)


Root Cause #2: No Shell Persistence

Problem: Unlike VSCode which maintains persistent PTY sessions, OpenCode creates isolated processes with no state sharing.

Comparison:

Aspect OpenCode VSCode
Shell Lifecycle Spawn per command Persistent PTY
State None persists CWD, env, history persist
Process Count 100+ per session 1-3 per terminal

Fixes for Root Cause #2

Fix 2A: Session State Manager

Implementation Location: packages/opencode/src/shell/state.ts (NEW)

Implementation:

export namespace StateManager {
  const state = Instance.state(
    () => new Map<string, SessionState>(),
    async (sessions) => {
      for (const session of sessions.values()) {
        await cleanup(session.id)
      }
      sessions.clear()
    }
  )
  
  export function getState(sessionID: string): SessionState {
    if (!state().has(sessionID)) {
      state().set(sessionID, {
        id: sessionID,
        cwd: Instance.directory,
        env: { ...process.env },
        history: [],
        permissions: new Set(),
        backgroundJobs: [],
        timestamp: Date.now(),
      })
    }
    return state().get(sessionID)!
  }
  
  export async function updateState(
    sessionID: string,
    updates: Partial<SessionState>
  ): Promise<void> {
    const current = getState(sessionID)
    const newState = { ...current, ...updates }
    
    if (updates.cwd && !validateDirectoryAccess(sessionID, updates.cwd)) {
      throw new Permission.RejectedError(
        sessionID, 'external_directory', '', { path: updates.cwd }
      )
    }
    
    if (updates.cwd) {
      newState.permissions.add(updates.cwd)
    }
    
    newState.timestamp = Date.now()
    state().set(sessionID, newState)
  }
  
  async function validateDirectoryAccess(
    sessionID: string,
    path: string
  ): Promise<boolean> {
    const agent = await Agent.get(sessionID)
    const state = getState(sessionID)
    
    if (agent.permission.external_directory === 'allow') return true
    if (agent.permission.external_directory === 'deny') {
      return Filesystem.contains(Instance.directory, path)
    }
    return state.permissions.has(path) || Filesystem.contains(Instance.directory, path)
  }
}

Fix 2B: Shell Integration (State Tracking)

Implementation Location: packages/opencode/src/shell/integration.ts (NEW)

Implementation:

export namespace ShellIntegration {
  export function getIntegrationScript(shell: string): string {
    const scripts = {
      bash: `
if [ -n "$OPENCODE_INTEGRATION" ]; then
  __opencode_command_start() { printf '\033]133;C;\007' }
  __opencode_command_finished() { printf '\033]133;F;%s\007' "$?" }
  __opencode_cwd_changed() { printf '\033]133;D;%s\007' "$PWD" }
  export PROMPT_COMMAND='__opencode_command_finished\007'
  trap '__opencode_command_start' DEBUG
  trap '__opencode_cwd_changed' DEBUG
fi`,
      zsh: `preexec() { __opencode_command_start }
precmd() { __opencode_command_finished }
chpwd() { __opencode_cwd_changed }`,
      powershell: `
function prompt {
  __opencode-Command-Finished $LASTEXITCODE
  "PS $PWD> "
}`,
    }
    return scripts[shell] || scripts.bash
  }
  
  export function parseSequences(data: string): ShellEvent[] {
    const events: ShellEvent[] = []
    const oscPattern = /\x1b]OC;(\d+);(.+?)\x07/g
    
    let match
    while ((match = oscPattern.exec(data)) !== null) {
      const [fullMatch, code, content] = match
      const parts = content.split(';')
      
      if (parseInt(code) === 133) {
        const type = parts[0]?.[0]
        if (type === 'F') {
          events.push({ type: 'command_finish', data: { exitCode: parseInt(parts[1] || '0') } })
        } else if (type === 'D') {
          events.push({ type: 'cwd_changed', data: { cwd: parts.slice(1).join(';') } })
        }
      }
      data = data.slice(match.index + fullMatch.length)
    }
    return events
  }
}

Expected Outcome: Natural shell experience with persistent CWD, environment, and history.


Root Cause #3: Windows Git Bash Overhead

Problem: On Windows, Git Bash (MSYS2) has significant fork overhead and known DLL initialization issues.

Evidence:

0 [main] bash 1323 dofork: child -1 - forked process 65644 died unexpectedly
/usr/bin/bash: fork: retry: Resource temporarily unavailable
0 [main] bash 1323 child_copy: dll data read copy failed, Win32 error 299

Known MSYS2 Issues:

  • Process forking limits on Windows
  • DLL initialization failures (0xC0000142)
  • Memory allocation failures (0xC0000005)

Fixes for Root Cause #3

Fix 3A: Native Shell Fallback for Windows

Implementation Location: packages/opencode/src/shell/shell.ts

Implementation:

export namespace Flag {
  export const OPENCODE_PREFER_NATIVE_SHELL = truthy("OPENCODE_PREFER_NATIVE_SHELL")
}

export const acceptable = lazy(() => {
  if (process.platform === "win32" && Flag.OPENCODE_PREFER_NATIVE_SHELL) {
    const nativeShells = ["powershell.exe", "cmd.exe"]
    for (const shell of nativeShells) {
      const path = Bun.which(shell)
      if (path) {
        log.info("Using native Windows shell", { shell })
        return path
      }
    }
  }
  const s = process.env.SHELL
  if (s && !BLACKLIST.has(path.win32.basename(s))) return s
  return fallback()
})

Fix 3B: Command Translation for PowerShell

Implementation:

export namespace ShellCommand {
  export function toPowerShell(command: string): string {
    return command
      .replace(/&&/g, ";")
      .replace(/\|\|/g, "; if ($LASTEXITCODE -ne 0) { }")
      .replace(/echo\s+(-n)?\s*["']?([^"'\n]+)["']?/gi, 'Write-Host "$2"')
      .replace(/\$(\w+)/g, '$env:$1')
  }
}

Expected Outcome: Avoid MSYS2 fork issues, 10-100x faster spawn times on Windows.


Root Cause #4: No Retry Logic for Transient Errors

Problem: When fork fails with EAGAIN (errno 11), the command immediately fails without retry.

Current Behavior:

proc.once("error", (error) => {
  exited = true
  cleanup()
  reject(error)  // No retry, immediate failure
})

Fixes for Root Cause #4

Fix 4A: Retry Logic with Exponential Backoff

Implementation Location: packages/opencode/src/tool/bash.ts

Implementation:

const MAX_RETRIES = 5
const INITIAL_DELAY = 100
const MAX_DELAY = 5000
const BACKOFF_MULTIPLIER = 2

const RETRYABLE_ERRORS = [
  "EAGAIN",      // errno 11
  "ENOMEM",
  "0xC0000142",  // Windows DLL init
  "0xC0000005",  // Windows access violation
]

function isRetryableError(error: Error): boolean {
  return RETRYABLE_ERRORS.some(code => 
    error.message?.includes(code) || 
    error.name?.includes(code)
  )
}

async function spawnWithRetry(command: string, options: SpawnOptions): Promise<ChildProcess> {
  let lastError: Error | null = null
  let delay = INITIAL_DELAY
  
  for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
    try {
      return spawn(command, options)
    } catch (error: any) {
      lastError = error
      if (isRetryableError(error)) {
        log.warn("Spawn failed, retrying", { attempt, error, delay })
        await Bun.sleep(delay)
        delay = Math.min(delay * BACKOFF_MULTIPLIER, MAX_DELAY)
      } else {
        throw error
      }
    }
  }
  throw lastError
}

Expected Outcome: Handles transient resource exhaustion, commands succeed instead of failing.


Root Cause #5: No Process Limits

Problem: OpenCode can spawn unlimited processes, potentially hitting OS limits.


Fixes for Root Cause #5

Fix 5A: Shell Semaphore with Queue

Implementation Location: packages/opencode/src/tool/bash.ts

Implementation:

const MAX_CONCURRENT_SHELLS = {
  win32: 10,
  darwin: 20,
  linux: 30,
  default: 15,
}[process.platform] || 15

const SHELL_QUEUE_MAX = 50
const SHELL_QUEUE_TIMEOUT = 30000

class ShellSemaphore {
  private permits: number
  private queue: Array<() => void> = []
  
  constructor(private maxPermits: number) {
    this.permits = maxPermits
  }
  
  async acquire(): Promise<void> {
    if (this.permits > 0) {
      this.permits--
      return
    }
    return new Promise((resolve) => {
      this.queue.push(resolve)
    })
  }
  
  release(): void {
    this.permits++
    if (this.queue.length > 0) {
      const next = this.queue.shift()!
      this.permits--
      next()
    }
  }
}

const shellSemaphore = new ShellSemaphore(MAX_CONCURRENT_SHELLS)

async function executeWithLimit(params: BashParams): Promise<BashResult> {
  const startTime = Date.now()
  
  if (this.queue.length >= SHELL_QUEUE_MAX) {
    throw new Error("Shell queue full")
  }
  
  await shellSemaphore.acquire()
  
  try {
    return await spawnWithRetry(params.command, params.options)
  } finally {
    shellSemaphore.release()
  }
}

Expected Outcome: Prevents hitting OS process limits, predictable resource usage.


Summary: Root Causes → Fixes Mapping

Root Cause Problem Fixes
#1: Per-command spawning 100+ processes per session 1A: Persistent Shell Pool
#2: No shell persistence No state between commands 2A: State Manager, 2B: Shell Integration
#3: Windows Git Bash overhead MSYS2 fork issues 3A: Native Shell Fallback
#4: No retry logic EAGAIN fails immediately 4A: Retry Logic
#5: No process limits Unlimited spawning 5A: Shell Semaphore

burgercrisis avatar Dec 30 '25 23:12 burgercrisis