Vector‑Searchable Conversational Memory

Open unforced opened this issue 9 months ago • 0 comments

✨ Feature Proposal — Vector‑Searchable Conversational Memory

Embed salient chat turns & surface them through the existing POST /search endpoint.

POST /chat saves entire transcripts in chatThreads.messages (JSONB) but does not embed individual turns.
POST /search only returns document chunks, so past chat decisions are invisible.
External tools (Open WebUI, Cursor, Goose, Flowise) want a single memory store that serves both docs and chat snippets.

Schema – add a child table chat_message_embeddings
- id bigserial primary key
- threadId → chatThreads.uuid
- role (user / assistant), text, ts
- embedding pgvector(768)
- metadata JSONB (e.g., { "source": "webui" })
Write path (POST /chat)
- Feature‑flagged by ENABLE_CHAT_MEMORY.
- After each user turn and assistant reply, call the existing embedding model if shouldEmbed(text, policy) returns true, then insert into chat_message_embeddings.
Search path (POST /search)
- Extend current SQL helper:
  - When body includes "includeChats": true and feature flag on, UNION chat‑message rows with document chunks.
  - Return objects with "src": "doc" | "chat" so clients can badge origin.
Config Flags
- ENABLE_CHAT_MEMORY=false (default)
- CHAT_EMBED_POLICY=salient (off, all, salient, summary_only)
- CHAT_EMBED_MIN_LEN=10
- CHAT_SUMMARY_EVERY=20 (placeholder for future summary tier)

Users – Can ask “What did we rename tick() to?” and get a chat citation.
Tools – Just add "includeChats": true to /search; no extra endpoint.
Maintainers – Mirrors existing documents → chunks pattern; default opt‑out means zero impact on current installs.

Table name: keep chat_message_embeddings or prefer chat_messages with embedded vector column?
Default embed policy: salient vs. summary_only.
Would a richer filter DSL ("type": ["doc","chat"]) be better than the boolean includeChats flag?

Happy to raise a PR implementing the above once we converge on details.

Apr 17 '25 20:04 unforced