claude-mem icon indicating copy to clipboard operation
claude-mem copied to clipboard

Fix Windows worker startup failures from Bun zombie sockets

Open Copilot opened this issue 4 months ago • 0 comments

Bun leaves TCP sockets bound after termination on Windows (#12127), causing EADDRINUSE errors on worker restart. Previously required system reboot.

Changes

ProcessManager.ts

  • cleanupWindowsPort(): Uses netstat to find PIDs holding port, kills with taskkill /F
  • start(): 3 retry attempts with 2s/4s/6s exponential backoff, calls cleanup between attempts
  • waitForHealth(): Scans recent log lines for EADDRINUSE, returns specific error with port number
  • Port and PID validated before shell execution (command injection prevention)

worker-service.ts

  • Detects EADDRINUSE in catch handler, shows port-specific error message

Error Flow

Before:

Attempt 1: EADDRINUSE → Manual intervention required → System reboot

After (Windows only):

Attempt 1: EADDRINUSE
→ netstat -ano | findstr :37777
→ taskkill /F /PID <zombie_pid>
→ wait 2s
Attempt 2: SUCCESS

Non-Windows platforms unaffected. All errors non-fatal with graceful degradation.

Original prompt

This section details on the original issue you should resolve

<issue_title>windows11 64 system</issue_title> <issue_description># Bug Report: Worker Service Fails to Start on Port 37777

🐛 Bug Description

After installing the claude-mem plugin, the mem service starts successfully when running the claude command. However, when Claude Code is running, the mem service fails to start with repeated errors indicating that the worker cannot start on port 37777.

📋 Steps to Reproduce

  1. Install claude-mem plugin
  2. Start Claude Code application
  3. The worker service attempts to start but fails repeatedly
  4. Error message appears: Worker failed to start Failed to start server. Is port 37777 in use?
  5. Attempting to restart with npm run worker:restart results in: Failed to restart: Process died during startup

❌ Error Messages

From Logs (Image 1):

[2025-12-15 14:22:27.630] [ERROR] [SYSTEM] ⚠️?Worker failed to start Failed to start server. Is port 37777 in use?
[2025-12-15 14:24:23.564] [ERROR] [SYSTEM] ⚠️?Worker failed to start Failed to start server. Is port 37777 in use?
[... repeated multiple times ...]

From Manual Restart Attempt (Image 2):

PS C:\Users\BAIJUN\.claude\plugins\marketplaces\thedotmack> npm run worker:restart

> [email protected] worker:restart
> bun plugin/scripts/worker-cli.js restart

Failed to restart: Process died during startup

From Error Details (Image 3):

Error: Worker service failed to start on port 37777. (port 37777)

To restart the worker:
1. Exit Claude Code completely
2. Open Command Prompt or PowerShell
3. Navigate to: %USERPROFILE%\.claude\plugins\marketplaces\thedotmack
4. Run: npm run worker:restart
5. Restart Claude Code

Stack Trace:

at Y (file:///C:/Users/BAIJUN/.claude/plugins/marketplaces/thedotmack/plugin/scripts/summary-hook.js:16:2074)
at async vt (file:///C:/Users/BAIJUN/.claude/plugins/marketplaces/thedotmack/plugin/scripts/summary-hook.js:20:450)
at async Socket.<anonymous> (file:///C:/Users/BAIJUN/.claude/plugins/marketplaces/thedotmack/plugin/scripts/summary-hook.js:20:1665)

Node.js v22.15.0

✅ Expected Behavior

The worker service should start successfully on port 37777 when Claude Code is running, just as it does when running the claude command.

💻 Environment

  • OS: Windows (based on path: C:\Users\BAIJUN\.claude\plugins\...)
  • Plugin Path: %USERPROFILE%\.claude\plugins\marketplaces\thedotmack
  • Claude-mem version: 7.2.1
  • Node.js version: v22.15.0
  • Platform: Claude Code

🔍 Additional Context

  1. Port Conflict: The error suggests port 37777 might be in use. This could indicate:

    • Another instance of the worker is already running
    • Claude Code and the standalone claude command are conflicting
    • The port is not being properly released when switching between applications
  2. Process Lifecycle Issue: The "Process died during startup" error suggests the worker process is terminating immediately after launch, possibly due to:

    • Configuration incompatibility with Claude Code
    • Environment variable differences between claude command and Claude Code
    • Initialization error specific to the Claude Code runtime environment
  3. Workaround Attempted: Following the suggested restart procedure did not resolve the issue

🔧 Suggested Investigation Areas

  1. Check if multiple instances of the worker are running simultaneously
  2. Verify port 37777 availability when Claude Code starts
  3. Compare environment variables between claude command and Claude Code contexts
  4. Review worker process startup logs for initialization errors
  5. Check for differences in how the plugin is loaded in Claude Code vs. CLI

📝 Logs Location

Worker logs should be available at:

  • Path: %USERPROFILE%\.claude-mem\logs\worker-2025-12-15.log

Would you like me to include additional diagnostic information?</issue_description>

Comments on the Issue (you are @copilot in this section)

@thedotmack # Windows, Bun, and Worker Service Struggles

A comprehensive chronicle of platform-specific issues, attempted fixes, and architectural decisions.

Executive Summary

The claude-mem project has faced persistent Windows-specific issues centered around three core problems:

  1. Console Window Popups: Blank terminal windows appearing when spawning worker and SDK subprocess
  2. Zombie Socket Issues: Bun leaving TCP sockets in LISTEN state after termination on Windows
  3. Process Management Complexity: Platform-specific spawning logic and reliability issues

These issues have driven multiple PRs, architectural pivots, and significant debate about runtime switching (Bun → Node.js).


Timeline of Issues

Issue thedotmack/claude-mem#209: Windows Worker Startup Failures (Dec 12-13, 2025)

Problem: Worker service failed to start on Windows using PowerShell Start-Process approach.

Symptoms: ...

  • Fixes thedotmack/claude-mem#324

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Dec 16 '25 05:12 Copilot