v9.0.0: Crash-recovery loop when memory_session_id is not captured
Bug Description
Sessions created without memory_session_id cause an infinite crash-recovery loop. The generator continuously retries and fails with:
[ERROR] [SDK] ✗ OpenRouter agent error {sessionDbId=607} Cannot store observations: memorySessionId not yet captured
[INFO] [SESSION] [session-607] Generator auto-starting (observation) using OpenRouter
This loop runs indefinitely, growing the queue depth and consuming API tokens on every retry attempt.
Environment
- claude-mem version: 9.0.0
- OS: macOS (Darwin)
- Provider: OpenRouter (mimo-v2-flash:free, also reproduced with gpt-4o-mini)
-
Node version: (run
node -vand add here)
Steps to Reproduce
- Start a Claude Code session
- Session gets created in
sdk_sessionstable butmemory_session_idcolumn remains NULL/empty - Observations are enqueued to
pending_messages - Generator attempts to process queue
- Fails with "Cannot store observations: memorySessionId not yet captured"
- Generator auto-restarts (crash-recovery)
- Loop continues indefinitely
Evidence from Logs
[2026-01-08 11:19:16.784] [SDK] OpenRouter API usage {model=xiaomi/mimo-v2-flash:free, inputTokens=10452, outputTokens=129}
[2026-01-08 11:19:16.784] [ERROR] [SDK] ✗ OpenRouter agent error {sessionDbId=607} Cannot store observations: memorySessionId not yet captured
[2026-01-08 11:19:16.784] [INFO] [SESSION] [session-607] Generator aborted
[2026-01-08 11:19:16.853] [INFO] [SESSION] [session-607] Generator auto-starting (observation) using OpenRouter
Database State
Sessions missing memory_session_id:
SELECT id, content_session_id, memory_session_id, status FROM sdk_sessions WHERE memory_session_id IS NULL OR memory_session_id = '';
-- Results:
-- 607|a2265efb-c878-4be4-b2f5-1ed2323cc607||active
-- 605|83f05013-e4e2-4564-8cec-f03dfc8c5eb7||active
-- (multiple sessions affected)
Expected Behavior
- Sessions should not be created until
memory_session_idis captured - OR: Generator should skip/fail gracefully for sessions missing
memory_session_idinstead of infinite retry - OR: Crash-recovery should have a max retry limit before marking session as failed
Workaround
Manual database cleanup:
npm run worker:stop
sqlite3 ~/.claude-mem/claude-mem.db "DELETE FROM pending_messages;"
sqlite3 ~/.claude-mem/claude-mem.db "UPDATE sdk_sessions SET status = 'failed' WHERE memory_session_id IS NULL OR memory_session_id = '';"
npm run worker:start
Impact
- Queue grows unbounded (saw 25-36+ stuck items)
- Consumes API tokens on every failed retry (~10k tokens per attempt)
- Worker broadcasts
isProcessing=trueindefinitely - Web UI shows stuck queue badge that won't clear
Suggested Fix
Add a check in the generator to skip sessions with missing memory_session_id and mark them as failed after N retries, rather than infinite crash-recovery loop.
Temporary Fix for Affected Users
If you're stuck in this loop, run these commands to clear it:
# Stop the worker
cd ~/.claude/plugins/marketplaces/thedotmack
npm run worker:stop
# Clear stuck queue and mark broken sessions as failed
sqlite3 ~/.claude-mem/claude-mem.db "DELETE FROM pending_messages;"
sqlite3 ~/.claude-mem/claude-mem.db "UPDATE sdk_sessions SET status = 'failed' WHERE memory_session_id IS NULL OR memory_session_id = '';"
# Restart the worker
npm run worker:start
In version 9.0.0 of the claude-mem plugin, a bug was identified where sessions missing a memory_session_id result in an infinite crash-recovery loop. When such sessions are created, the generator repeatedly attempts to process the session queue, fails with an error indicating the missing memory_session_id, and auto-restarts itself. This cycle continues indefinitely, causing unbounded queue growth, excessive API token consumption (~10k tokens per failed attempt), and rendering the worker stuck in a processing state. Logs and database evidence confirm that multiple sessions are affected. The expected behavior should either prevent session creation without memory_session_id, gracefully skip these sessions, or enforce a maximum retry limit to mark them as failed. A temporary workaround involves manually stopping the worker, clearing the stuck queue, and marking broken sessions as failed via direct database manipulation. A suggested fix involves updating the generator logic to avoid infinite retries for sessions missing memory_session_id.
Additional observations from v9.0.3/v9.0.4
Still experiencing this issue on v9.0.4 with Gemini provider (CLAUDE_MEM_PROVIDER=gemini).
Observed behavior:
- Multiple sessions (sessionDbId=21877, 21915) stuck with
Cannot store observations: memorySessionId not yet captured - Queue accumulated 48+ pending messages
- Worker restart sometimes triggers auto-recovery that captures memorySessionId: Auto-recovered 1 sessions with pending work {totalPending=1, started=1, sessionIds=21915} MEMORY_ID_CAPTURED | sessionDbId=21915 | memorySessionId=37b7e2b8-...
Workaround that works:
- Kill worker:
pkill -f "worker-service.cjs" - Restart worker - auto-recovery may capture memorySessionId
- If still stuck,
/clearthe affected session in VSCode to force new session creation
Related:
PR #615 (generate memorySessionId for stateless providers) would fix this for Gemini/OpenRouter users but is not yet merged.
Fixed in v9.0.1+. Session ID capture and crash recovery were stabilized in subsequent releases. Please update to v9.1.1 (latest).