pgvecto.rs
pgvecto.rs copied to clipboard
Questions: How do we run this in supabase and drizzle?
I wanted to explore running this in supabase with drizzle orm. Do you have any guidance on how to do that?
pg vector has https://github.com/pgvector/pgvector-node to connect with drizzle orm
relevant links:
- https://github.com/drizzle-team/drizzle-orm
Unfortunately there's no way to install custom extension on postgres https://github.com/supabase/supabase/issues/14235. I would suggest you tried pgvecto.rs through docker first with your requirement. Such as
docker run --name pgvecto-rs-demo -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d tensorchord/pgvecto-rs:latest
I believe we can easily support this for the drizzle-orm. The only syntax that differs from pgvector is the index creation command. All other query commands are exactly the same as pgvector. Let me give it a try, and I will submit a pull request for drizzle soon.
I would also like to hear your scenarios (what kind of filter condition you'd like to use). pgvecto.rs has made significant efforts to support various filter mode (prefilter/postfilter/brute force, and we are working on bitmap pushdown to use postgres index on other columns) and optimize performance. If it helps, I can provide further guidance on performance optimization. Thank you!
I would need cross filtering for my usecases. Mostly my usecases are filtering by normal sql columns with vector similarity as well as one of the columns in the where clause.
pre/Post filtering has a higher probably of no/spare results or irrelevant results.
For example searching for documents with a tag and with similarity (vector column) assigned to a particular team.
We are also building an example (https://github.com/kemingy/ragen/blob/main/ragen/client.py#L77-L82) similar to your scenario, using vector search with tag filter. And it worked well based on our example.
For pgvecto.rs, The default prefilter will ensure that the vector index returns a number of results equal to vectors.k and meets the specified filter condition.
May I ask what your typical filter condition selection rate is (what percentage of data satisfy your filter condition)? Also what is the "cross filtering" method you mentioned??
Supabase only supports Trusted Language Extensions(TLE) https://github.com/supabase/supabase/pull/14600#issuecomment-1563399607 for security of custom extensions.
Many extensions provide functions whose implementation is written in C, and creating them in a database means that the compiled C code is “dynamically linked” into your running Postgres process. These dynamically-loaded libraries can now access every aspect of your running database process, right down to raw memory. They are essentially database superusers on steroids. Because of this, C is an “untrusted language” and installing extensions written in C requires filesystem access.
There is a Rust tle implementation: https://github.com/tcdi/plrust However, pgvecto.rs is hard to converted to TLE as it uses ipc/mmap, which is absolutely forbidden.