pgvecto.rs
pgvecto.rs copied to clipboard
feat(fdw): How to be compatible with new pgvector types
https://github.com/pgvector/pgvector/issues/508
For:
- [x] svector -> sparsevec
- [ ] Rename
- [ ] text representation: https://github.com/pgvector/pgvector/blob/f3477cf28d8cc9d7d8926845a098a73950161692/test/expected/sparsevec_input.out#L1
- [ ] input/output: compatible mode?
- [x] vecf16 -> halfvec
- [ ] Rename
- [ ] bvector -> bit
- [ ] cast
- [ ] text representation
- [ ] operator: <%> = jaccard_distance(bit, bvector) / <~> = hamming_distance(bit, bvector)
- [ ] veci8 -> intvec
Possible Methods:
- ~~Hook / Rewrite~~ --> Different name mapping will fail to push down to index
- Symbol renaming
FDW problems
At step CREATE SERVER
:
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'host.docker.internal', port '28816', dbname 'vectors', sslmode 'disable', use_remote_estimate 'true', extensions 'vector');
GRANT USAGE ON FOREIGN SERVER foreign_server TO local_user;
- If not config the
extensions='vector'
atCREATE SERVER
step of local machine- Then a full table scan will be sent to remote, unable to use an index
postgres=> EXPLAIN verbose select * from foreign_test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402]' LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=4881.00..4881.00 rows=1 width=44)
Output: id, embedding, ((embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
-> Sort (cost=4881.00..5131.00 rows=100000 width=44)
Output: id, embedding, ((embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
Sort Key: ((foreign_test.embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector))
-> Foreign Scan on public.foreign_test (cost=100.00..4381.00 rows=100000 width=44)
Output: id, embedding, (embedding <-> '[0.40671515,0.24202824,0.37059402]'::vector)
Remote SQL: SELECT id, embedding FROM public.test
- If config the
extensions='vector'
atCREATE SERVER
step of local machine- It requires the extension at "local" and "remote" is installed in same schema
postgres=> EXPLAIN verbose select * from foreign_test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402]' LIMIT 1;
ERROR: operator does not exist: vectors.vector public.<-> unknown
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
CONTEXT: remote SQL command: EXPLAIN SELECT id, embedding FROM vectors.test ORDER BY (embedding OPERATOR(public.<->) '[0.40671515,0.24202824,0.37059402]') ASC NULLS LAST
The remote operator should be vectors.<->
instead of public.<->
, but the local pgvector
is installed at public
.
https://github.com/postgres/postgres/blob/43a9cab4844b9c933f1a24e24a38311ee24deefd/contrib/postgres_fdw/deparse.c#L3418
pgvecto.rs
can be only installed at schema vectors
, but pgvector
can be installed any schema.
For example, in supabase, pgvector
is installed at schema extensions
.
Alternatives
- multiply images