pgvecto.rs icon indicating copy to clipboard operation
pgvecto.rs copied to clipboard

feat(fdw): How to be compatible with new pgvector types

Open cutecutecat opened this issue 10 months ago • 1 comments

https://github.com/pgvector/pgvector/issues/508

For:

  • [x] svector -> sparsevec
    • [ ] Rename
    • [ ] text representation: https://github.com/pgvector/pgvector/blob/f3477cf28d8cc9d7d8926845a098a73950161692/test/expected/sparsevec_input.out#L1
      • [ ] input/output: compatible mode?
  • [x] vecf16 -> halfvec
    • [ ] Rename
  • [ ] bvector -> bit
    • [ ] cast
    • [ ] text representation
    • [ ] operator: <%> = jaccard_distance(bit, bvector) / <~> = hamming_distance(bit, bvector)
  • [ ] veci8 -> intvec

Possible Methods:

  • ~~Hook / Rewrite~~ --> Different name mapping will fail to push down to index
  • Symbol renaming

cutecutecat avatar Apr 08 '24 03:04 cutecutecat

FDW problems

At step CREATE SERVER:

CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'host.docker.internal', port '28816', dbname 'vectors', sslmode 'disable', use_remote_estimate 'true', extensions 'vector');
GRANT USAGE ON FOREIGN SERVER foreign_server TO local_user;
  • If not config the extensions='vector' at CREATE SERVER step of local machine
    • Then a full table scan will be sent to remote, unable to use an index
postgres=> EXPLAIN verbose select * from foreign_test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402]' LIMIT 1;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Limit  (cost=4881.00..4881.00 rows=1 width=44)
   Output: id, embedding, ((embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
   ->  Sort  (cost=4881.00..5131.00 rows=100000 width=44)
         Output: id, embedding, ((embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
         Sort Key: ((foreign_test.embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
         ->  Foreign Scan on public.foreign_test  (cost=100.00..4381.00 rows=100000 width=44)
               Output: id, embedding, (embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector)
               Remote SQL: SELECT id, embedding FROM public.test
  • If config the extensions='vector' at CREATE SERVER step of local machine
    • It requires the extension at "local" and "remote" is installed in same schema
postgres=> EXPLAIN verbose select * from foreign_test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402]' LIMIT 1;
ERROR:  operator does not exist: vectors.vector public.<-> unknown
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
CONTEXT:  remote SQL command: EXPLAIN SELECT id, embedding FROM vectors.test ORDER BY (embedding OPERATOR(public.<->) '[0.40671515,0.24202824,0.37059402]') ASC NULLS LAST

The remote operator should be vectors.<-> instead of public.<->, but the local pgvector is installed at public. https://github.com/postgres/postgres/blob/43a9cab4844b9c933f1a24e24a38311ee24deefd/contrib/postgres_fdw/deparse.c#L3418

pgvecto.rs can be only installed at schema vectors, but pgvector can be installed any schema.

For example, in supabase, pgvector is installed at schema extensions.

Alternatives

  • multiply images

cutecutecat avatar Apr 09 '24 08:04 cutecutecat