opencode icon indicating copy to clipboard operation
opencode copied to clipboard

Sessions hang indefinitely when Task tool spawns subagents via REST API (opencode serve)

Open leeweisern opened this issue 2 weeks ago • 0 comments

Description

I'm using OpenCode as a backend service via opencode serve to power a Telegram bot. The setup is:

  • OpenCode runs as a systemd service using opencode serve --port 4096 --hostname 0.0.0.0
  • A separate Node.js service (running on AWS ECS/EC2) uses the @opencode-ai/sdk to interact with OpenCode via REST API
  • The service polls session status and streams responses back to users This works great for simple queries, but sessions hang indefinitely when the LLM uses the Task tool to spawn subagents (e.g., the explore agent).

Expected Behavior When a prompt triggers the Task tool (e.g., "explore the codebase to find X"), the subagent should complete and return results back to the parent session, which then completes normally.

Actual Behavior

  1. Parent session spawns a subagent via Task tool
  2. Subagent starts processing (visible in logs: step=0, step=1, step=2...)
  3. Multiple NotFoundError rejections occur in acp-command service
  4. Both parent and subagent sessions get stuck in "busy" state forever
  5. No response is ever returned via the SDK

Important: This ONLY happens via REST API The exact same prompt works perfectly when using the TUI (opencode CLI). The subagent completes normally and the parent session receives the result.


Reproduction Steps

  1. Start OpenCode in serve mode: opencode serve --port 4096 --hostname 0.0.0.0

  2. Use the SDK to send a prompt that triggers a subagent:

      import { createOpencodeClient } from "@opencode-ai/sdk/v2";
   const client = createOpencodeClient({
     baseUrl: "http://localhost:4096",
   });

   // Create or load a session
   const session = await client.session.create({ directory: "/path/to/project" });

   // Send a prompt that will trigger the explore subagent
   await client.session.prompt({
     sessionID: session.data.id,
     directory: "/path/to/project",
     parts: [{ type: "text", text: "How is authentication implemented in this codebase?" }],
     agent: "build",
     model: { providerID: "anthropic", modelID: "claude-sonnet-4-20250514" },
   });

   // Poll for completion - this will hang forever
   while (true) {
     const status = await client.session.status({});
     console.log(status.data.sessions[session.data.id]); // Always shows { type: "busy" }
     await new Promise(r => setTimeout(r, 1000));
   }
  1. Observe the session is stuck in busy state and never completes.

Log Analysis Serve instance logs (via /proc//fd/14): Session processing starts normally: INFO 2026-01-01T11:19:31 service=session.prompt step=0 sessionID=ses_parent123 loop INFO 2026-01-01T11:19:31 service=llm providerID=anthropic modelID=claude-opus-4-5 sessionID=ses_parent123 stream Subagent is spawned via Task tool: INFO 2026-01-01T11:19:37 service=session.prompt step=0 sessionID=ses_subagent456 loop INFO 2026-01-01T11:19:37 service=llm providerID=anthropic modelID=claude-opus-4-5 sessionID=ses_subagent456 agent=explore stream INFO 2026-01-01T11:19:42 service=session.prompt step=1 sessionID=ses_subagent456 loop INFO 2026-01-01T11:19:46 service=session.prompt step=2 sessionID=ses_subagent456 loop Then errors start appearing - one per second: ERROR 2026-01-01T11:19:46 service=acp-command promise={} reason=NotFoundError Unhandled rejection ERROR 2026-01-01T11:19:46 service=default e=NotFoundError rejection ERROR 2026-01-01T11:19:47 service=acp-command promise={} reason=NotFoundError Unhandled rejection ERROR 2026-01-01T11:19:47 service=default e=NotFoundError rejection ... (continues indefinitely) After the errors, no more session.prompt activity - session is stuck: INFO 2026-01-01T11:30:54 service=server method=GET path=/global/health request INFO 2026-01-01T11:30:58 service=server method=GET path=/global/health request ... (only health checks, no session processing) Session status shows multiple stuck sessions: { sessions: { ses_parent123: { type: busy }, ses_subagent456: { type: busy }, // ... 14 more stuck sessions from previous attempts } }


Root Cause Analysis After tracing through the code, I believe the issue is in packages/opencode/src/acp/agent.ts in the setupEventSubscriptions function.

The Problem Flow:

  1. When a session is created via ACP, setupEventSubscriptions() is called (line 62)
  2. This subscribes to events using the SDK: this.config.sdk.event.subscribe({ directory }) (line 71)
  3. When a message.part.updated event comes in, it tries to fetch the message (line 132-145):
const message = await this.config.sdk.session
  .message(
    {
      sessionID: part.sessionID,  // ← This could be a SUBAGENT session ID
      messageID: part.messageID,
      directory,                   // ← But this is the PARENT session's directory
    },
    { throwOnError: true },
  )
  .then((x) => x.data)
  .catch((err) => {
    log.error("unexpected error when fetching message", { error: err })
    return undefined
  })
  1. When the Task tool spawns a subagent, events for the subagent session also come through the same event subscription
  2. The message fetch fails with NotFoundError (possibly timing issue - message not yet persisted, or directory context mismatch)
  3. The .catch() block returns undefined, but this doesn't properly abort/cleanup the session
  4. The session remains in "busy" state forever Why TUI Works: In the TUI, everything runs in the same process:
  • No network calls needed to fetch messages
  • State is directly accessible via Instance.state()
  • Subagent sessions are handled in the same async context
  • The SessionPrompt.loop() function can properly track and complete subagent sessions

Environment

  • OpenCode version: 1.0.220
  • OS: Ubuntu 22.04 (AWS EC2)
  • Model: anthropic/claude-opus-4-5 (but likely affects all models)

OpenCode version

No response

Steps to reproduce

No response

Screenshot and/or share link

No response

Operating System

No response

Terminal

No response

leeweisern avatar Jan 01 '26 11:01 leeweisern