Archon icon indicating copy to clipboard operation
Archon copied to clipboard

fix: PostgreSQL search function type mismatches and auto-detect embed…

Open IM-21-DEV opened this issue 2 months ago • 2 comments

fix: PostgreSQL search function type mismatches and auto-detect embedding dimensions

Fixes multiple critical bugs in PostgreSQL search functions:

  1. Type Mismatches:

    • VARCHAR → TEXT for url and source_id columns
    • FLOAT → DOUBLE PRECISION for similarity scores
    • Fixes 'structure of query does not match function result type' errors
  2. Hardcoded Embedding Dimensions:

    • Removed vector(1536) constraints from wrapper functions
    • Added auto-detection using vector_dims() function
    • Now supports all dimensions: 384, 768, 1024, 1536, 3072
  3. Missing Permissions (migration 011):

    • Added RLS policies for archon_page_metadata table
    • Added GRANT statements for all roles
    • Fixes 'permission denied for table' errors

Benefits:

  • Works with ANY embedding model (Google 768-dim, OpenAI 1536-dim, etc.)
  • Future-proof for new embedding dimensions
  • No backend code changes required
  • Type-safe function calls

Affected functions:

  • hybrid_search_archon_crawled_pages
  • hybrid_search_archon_crawled_pages_multi
  • hybrid_search_archon_code_examples
  • hybrid_search_archon_code_examples_multi
  • match_archon_crawled_pages
  • match_archon_crawled_pages_multi
  • match_archon_code_examples
  • match_archon_code_examples_multi

Files modified:

  • migration/complete_setup.sql
  • migration/0.1.0/002_add_hybrid_search_tsvector.sql
  • migration/0.1.0/011_add_page_metadata_table.sql

🤖 Generated with [Claude Code]

Pull Request

Summary

Changes Made

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation update
  • [ ] Performance improvement
  • [ ] Code refactoring

Affected Services

  • [ ] Frontend (React UI)
  • [ ] Server (FastAPI backend)
  • [ ] MCP Server (Model Context Protocol)
  • [ ] Agents (PydanticAI service)
  • [ ] Database (migrations/schema)
  • [ ] Docker/Infrastructure
  • [ ] Documentation site

Testing

  • [ ] All existing tests pass
  • [ ] Added new tests for new functionality
  • [ ] Manually tested affected user flows
  • [ ] Docker builds succeed for all services

Test Evidence

# Example: python -m pytest tests/
# Example: cd archon-ui-main && npm run test

Checklist

  • [ ] My code follows the service architecture patterns
  • [ ] If using an AI coding assistant, I used the CLAUDE.md rules
  • [ ] I have added tests that prove my fix/feature works
  • [ ] All new and existing tests pass locally
  • [ ] My changes generate no new warnings
  • [ ] I have updated relevant documentation
  • [ ] I have verified no regressions in existing features

Breaking Changes

Additional Notes

Summary by CodeRabbit

  • New Features

    • Hybrid search now accepts embeddings of any dimension (auto-detected) and returns a match_type indicator.
    • Similarity scores upgraded to double precision for improved numeric accuracy.
  • Refactor

    • Unified, dimension-aware search paths; return URLs standardized to TEXT.
  • Chores

    • Row-level security enabled for page metadata with restricted read/write policies and an updated_at trigger.
  • Tests / Documentation

    • Added verification tests and summary/README documentation for the fixes.

IM-21-DEV avatar Oct 14 '25 19:10 IM-21-DEV

Walkthrough

Replaces fixed-dimension embeddings with generic VECTOR in hybrid search wrappers, auto-detects embedding dimension via vector_dims, and delegates to dimension-aware multi-functions; standardizes return types (url -> TEXT, similarity -> DOUBLE PRECISION). Adds RLS, policies, grants, and an updated_at trigger for archon_page_metadata.

Changes

Cohort / File(s) Summary of changes
Hybrid search wrappers & multi-functions
migration/0.1.0/002_add_hybrid_search_tsvector.sql, 01.Update-Fix/ARCHON-FIX-HYBRID-SEARCH-DOUBLE-PRECISION.sql, 01.Update-Fix/ARCHON-FIX-COMPLETE-WITH-MATCH-TYPE.sql, 01.Update-Fix/ARCHON-FIX-SEARCH-FUNCTIONS-COMPLETE.sql
Wrapper functions changed query_embedding from vector(1536) to generic VECTOR, declare detected_dimension INT, compute vector_dims(query_embedding), and forward the detected dimension to multi-variant functions. Multi-functions and wrappers now return url as TEXT and similarity (and related scores) as DOUBLE PRECISION; some variants add match_type and rank_score. Dynamic SQL selects embedding column by dimension and errors on unsupported sizes.
Complete setup & helpers
migration/complete_setup.sql
Introduces helper logic and functions for dimension handling (e.g., detect/get embedding column name), dynamic routing to embedding_* columns, and updated wrappers/multi-functions to use embedding_dimension parameter.
Page metadata RLS, grants & triggers
migration/0.1.0/011_add_page_metadata_table.sql, 01.Update-Fix/ARCHON-FIX-PAGE-METADATA-PERMISSIONS.sql, 01.Update-Fix/ARCHON-FIX-SECURITY-RLS-POLICIES.sql, 01.Update-Fix/APPLY-VIA-SUPABASE-SQL-EDITOR.sql
Enables Row Level Security on archon_page_metadata; creates policies (service_role full access, authenticated SELECT), applies GRANTs (postgres/service_role ALL; authenticated SELECT; anon revoked/adjusted), adds archon_set_updated_at() trigger to auto-update updated_at, and makes migration insert idempotent (ON CONFLICT DO NOTHING).
Grants-only / verification scripts
01.Update-Fix/ARCHON-FIX-GRANTS-ONLY.sql
Adds scripts to grant table-level privileges and verification queries for archon_page_metadata.
Docs, tests, and analysis artifacts
01.Update-Fix/ARCHON-BUG-ANALYSIS.md, 01.Update-Fix/ARCHON-RAG-FIX-COMPLETE-SUMMARY.md, 01.Update-Fix/README.md, 01.Update-Fix/test_rag_search.py
Adds RCA and summary documents, README for the fix set, and a test script test_rag_search.py to validate RAG search and source filtering; no public API changes from docs/tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Wrapper as Wrapper (hybrid_search_...)
  participant DBfunc as vector_dims()
  participant Multi as Multi-variant Function
  participant DB as DB (embedding_* cols)

  Client->>Wrapper: call(query_embedding VECTOR, query_text?, ...)
  Wrapper->>DBfunc: vector_dims(query_embedding)
  DBfunc-->>Wrapper: detected_dimension
  Wrapper->>Multi: invoke(query_embedding, detected_dimension, query_text?, ...)
  Multi->>DB: SELECT FROM embedding_<detected_dimension> (dynamic SQL)
  DB-->>Multi: rows with similarity (double precision)
  Multi-->>Client: merged results (url TEXT, similarity DOUBLE PRECISION, [match_type/rank_score])
  note right of Multi: dynamic selection of embedding column and full-text/vector merge

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • coleam00/Archon#718 — Overlaps migration and dimension-aware hybrid search helpers and wrappers (detect_embedding_dimension, multi-dim routing).
  • coleam00/Archon#539 — Modifies hybrid_search functions and tsvector-based hybrid search logic; related to return-type and match_type changes.
  • coleam00/Archon#681 — Adds/aligns multi-dimensional hybrid search variants and wrapper delegation by embedding dimension.

Suggested labels

bug

Suggested reviewers

  • leex279
  • tazmon95
  • coleam00

Poem

I twitched my whiskers, swapped a fixed-size thread,
Now VECTORs roam freely where dimensions led.
I sniff the dims, I pick the right column bed,
TEXTy tunnels, double-precisions spread.
Hopping secure with metadata tucked—hippity-hop ahead! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description provides a detailed narrative of the fixes and their benefits but fails to follow the repository’s required template since key sections such as “Type of Change,” “Affected Services,” “Testing,” and the checklist are left as unfilled placeholders. Please remove the unused template placeholders and complete each section according to the repository’s description template by filling in the “Summary,” listing “Changes Made,” marking the relevant boxes under “Type of Change” and “Affected Services,” providing testing details with evidence, and updating the checklist and any additional notes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the main changes by highlighting fixes to PostgreSQL search function type mismatches and the introduction of auto-detection for embedding dimensions, which aligns with the key modifications in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 14 '25 19:10 coderabbitai[bot]

This looks lie a solid contribution @IM-21-DEV thank you, could you please remove any unnecessary md files as they are in german, as well as remove the unnecessary SQL statement files, only keep the ones needed to migrate

Wirasm avatar Nov 24 '25 09:11 Wirasm