pgvecto.rs icon indicating copy to clipboard operation
pgvecto.rs copied to clipboard

Questions: How do we run this in supabase and drizzle?

Open ShravanSunder opened this issue 2 years ago • 5 comments
trafficstars

I wanted to explore running this in supabase with drizzle orm. Do you have any guidance on how to do that?

pg vector has https://github.com/pgvector/pgvector-node to connect with drizzle orm

relevant links:

  • https://github.com/drizzle-team/drizzle-orm

ShravanSunder avatar Oct 30 '23 23:10 ShravanSunder

Unfortunately there's no way to install custom extension on postgres https://github.com/supabase/supabase/issues/14235. I would suggest you tried pgvecto.rs through docker first with your requirement. Such as

docker run --name pgvecto-rs-demo -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d tensorchord/pgvecto-rs:latest

I believe we can easily support this for the drizzle-orm. The only syntax that differs from pgvector is the index creation command. All other query commands are exactly the same as pgvector. Let me give it a try, and I will submit a pull request for drizzle soon.

VoVAllen avatar Oct 31 '23 03:10 VoVAllen

I would also like to hear your scenarios (what kind of filter condition you'd like to use). pgvecto.rs has made significant efforts to support various filter mode (prefilter/postfilter/brute force, and we are working on bitmap pushdown to use postgres index on other columns) and optimize performance. If it helps, I can provide further guidance on performance optimization. Thank you!

VoVAllen avatar Oct 31 '23 03:10 VoVAllen

I would need cross filtering for my usecases. Mostly my usecases are filtering by normal sql columns with vector similarity as well as one of the columns in the where clause.

pre/Post filtering has a higher probably of no/spare results or irrelevant results.

For example searching for documents with a tag and with similarity (vector column) assigned to a particular team.

ShravanSunder avatar Oct 31 '23 12:10 ShravanSunder

We are also building an example (https://github.com/kemingy/ragen/blob/main/ragen/client.py#L77-L82) similar to your scenario, using vector search with tag filter. And it worked well based on our example.

For pgvecto.rs, The default prefilter will ensure that the vector index returns a number of results equal to vectors.k and meets the specified filter condition.

May I ask what your typical filter condition selection rate is (what percentage of data satisfy your filter condition)? Also what is the "cross filtering" method you mentioned??

VoVAllen avatar Oct 31 '23 16:10 VoVAllen

Supabase only supports Trusted Language Extensions(TLE) https://github.com/supabase/supabase/pull/14600#issuecomment-1563399607 for security of custom extensions.

Many extensions provide functions whose implementation is written in C, and creating them in a database means that the compiled C code is “dynamically linked” into your running Postgres process. These dynamically-loaded libraries can now access every aspect of your running database process, right down to raw memory. They are essentially database superusers on steroids. Because of this, C is an “untrusted language” and installing extensions written in C requires filesystem access.

There is a Rust tle implementation: https://github.com/tcdi/plrust However, pgvecto.rs is hard to converted to TLE as it uses ipc/mmap, which is absolutely forbidden.

cutecutecat avatar Nov 01 '23 07:11 cutecutecat