claude-mem v9.0.0: Crash-recovery loop when memory_session

Bug Description

Sessions created without memory_session_id cause an infinite crash-recovery loop. The generator continuously retries and fails with:

[ERROR] [SDK] ✗ OpenRouter agent error {sessionDbId=607} Cannot store observations: memorySessionId not yet captured
[INFO] [SESSION] [session-607] Generator auto-starting (observation) using OpenRouter

This loop runs indefinitely, growing the queue depth and consuming API tokens on every retry attempt.

Environment

claude-mem version: 9.0.0
OS: macOS (Darwin)
Provider: OpenRouter (mimo-v2-flash:free, also reproduced with gpt-4o-mini)
Node version: (run node -v and add here)

Steps to Reproduce

Start a Claude Code session
Session gets created in sdk_sessions table but memory_session_id column remains NULL/empty
Observations are enqueued to pending_messages
Generator attempts to process queue
Fails with "Cannot store observations: memorySessionId not yet captured"
Generator auto-restarts (crash-recovery)
Loop continues indefinitely

Evidence from Logs

[2026-01-08 11:19:16.784] [SDK] OpenRouter API usage {model=xiaomi/mimo-v2-flash:free, inputTokens=10452, outputTokens=129}
[2026-01-08 11:19:16.784] [ERROR] [SDK] ✗ OpenRouter agent error {sessionDbId=607} Cannot store observations: memorySessionId not yet captured
[2026-01-08 11:19:16.784] [INFO] [SESSION] [session-607] Generator aborted
[2026-01-08 11:19:16.853] [INFO] [SESSION] [session-607] Generator auto-starting (observation) using OpenRouter

Database State

Sessions missing memory_session_id:

SELECT id, content_session_id, memory_session_id, status FROM sdk_sessions WHERE memory_session_id IS NULL OR memory_session_id = '';

-- Results:
-- 607|a2265efb-c878-4be4-b2f5-1ed2323cc607||active
-- 605|83f05013-e4e2-4564-8cec-f03dfc8c5eb7||active
-- (multiple sessions affected)

Expected Behavior

Sessions should not be created until memory_session_id is captured
OR: Generator should skip/fail gracefully for sessions missing memory_session_id instead of infinite retry
OR: Crash-recovery should have a max retry limit before marking session as failed

Workaround

Manual database cleanup:

npm run worker:stop
sqlite3 ~/.claude-mem/claude-mem.db "DELETE FROM pending_messages;"
sqlite3 ~/.claude-mem/claude-mem.db "UPDATE sdk_sessions SET status = 'failed' WHERE memory_session_id IS NULL OR memory_session_id = '';"
npm run worker:start

Impact

Queue grows unbounded (saw 25-36+ stuck items)
Consumes API tokens on every failed retry (~10k tokens per attempt)
Worker broadcasts isProcessing=true indefinitely
Web UI shows stuck queue badge that won't clear

Suggested Fix

Add a check in the generator to skip sessions with missing memory_session_id and mark them as failed after N retries, rather than infinite crash-recovery loop.

Temporary Fix for Affected Users

If you're stuck in this loop, run these commands to clear it:

# Stop the worker
cd ~/.claude/plugins/marketplaces/thedotmack
npm run worker:stop

# Clear stuck queue and mark broken sessions as failed
sqlite3 ~/.claude-mem/claude-mem.db "DELETE FROM pending_messages;"
sqlite3 ~/.claude-mem/claude-mem.db "UPDATE sdk_sessions SET status = 'failed' WHERE memory_session_id IS NULL OR memory_session_id = '';"

# Restart the worker
npm run worker:start

Jan 08 '26 16:01 mrlfarano

In version 9.0.0 of the claude-mem plugin, a bug was identified where sessions missing a memory_session_id result in an infinite crash-recovery loop. When such sessions are created, the generator repeatedly attempts to process the session queue, fails with an error indicating the missing memory_session_id, and auto-restarts itself. This cycle continues indefinitely, causing unbounded queue growth, excessive API token consumption (~10k tokens per failed attempt), and rendering the worker stuck in a processing state. Logs and database evidence confirm that multiple sessions are affected. The expected behavior should either prevent session creation without memory_session_id, gracefully skip these sessions, or enforce a maximum retry limit to mark them as failed. A temporary workaround involves manually stopping the worker, clearing the stuck queue, and marking broken sessions as failed via direct database manipulation. A suggested fix involves updating the generator logic to avoid infinite retries for sessions missing memory_session_id.

Jan 08 '26 16:01 github-actions[bot]

Additional observations from v9.0.3/v9.0.4

Still experiencing this issue on v9.0.4 with Gemini provider (CLAUDE_MEM_PROVIDER=gemini).

Observed behavior:

Multiple sessions (sessionDbId=21877, 21915) stuck with Cannot store observations: memorySessionId not yet captured
Queue accumulated 48+ pending messages
Worker restart sometimes triggers auto-recovery that captures memorySessionId: Auto-recovered 1 sessions with pending work {totalPending=1, started=1, sessionIds=21915} MEMORY_ID_CAPTURED | sessionDbId=21915 | memorySessionId=37b7e2b8-...

Workaround that works:

Kill worker: pkill -f "worker-service.cjs"
Restart worker - auto-recovery may capture memorySessionId
If still stuck, /clear the affected session in VSCode to force new session creation

v9.0.0: Crash-recovery loop when memory_session_id is not captured

Bug Description

Environment

Steps to Reproduce

Evidence from Logs

Database State

Expected Behavior

Workaround

Impact

Suggested Fix

Temporary Fix for Affected Users

Additional observations from v9.0.3/v9.0.4

Observed behavior:

Workaround that works:

Related: