Query always errors because of [SQL: set local ivfflat.probes = %s::INTEGER] on my database
Bug report
- [x] I confirm this is a bug with Supabase, not with my own application.
- [x] I confirm I have searched the Docs, GitHub Discussions, and Discord.
Describe the bug
I have created a table on my GCP hosted Postgres with pgvector enabled.
I have upserted some documents with an adapter and also indexed my table with HNSW.
Then I try to run my service which searches these records using the same adapter and ASSUMING the indexing I did in the DB population persists with vecs.
However the query does not work I get this error back from my DB:
sqlalchemy.exc.DatabaseError: (pg8000.exceptions.DatabaseError) {'S': 'ERROR', 'V': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "$1"', 'P': '28', 'F': 'scan.l', 'L': '1240', 'R': 'scanner_yyerror'}
[SQL: set local ivfflat.probes = %s::INTEGER]
[parameters: (10,)]
I looked in the query method and it seems to always execute this code:
with self.client.Session() as sess:
with sess.begin():
# index ignored if greater than n_lists
sess.execute(
text("set local ivfflat.probes = :probes").bindparams(probes=probes)
)
if self.client._supports_hnsw():
sess.execute(
text("set local hnsw.ef_search = :ef_search").bindparams(
ef_search=ef_search
)
)
which seems to be the problem cause I dont have that indexing setup why does it always execute this? Can this be avoided somehow or is this a bug? Shouldnt it only use the indexing method that we have previously set up, or specify which one we are using instead of running this blindly?
Update
I think the error comes actually from the syntax in the SQL commands here:
with self.client.Session() as sess:
with sess.begin():
# index ignored if greater than n_lists
sess.execute(
text("set local ivfflat.probes = :probes").bindparams(probes=probes)
)
if self.client._supports_hnsw():
sess.execute(
text("set local hnsw.ef_search = :ef_search").bindparams(
ef_search=ef_search
)
)
if len(cols) == 1:
return [str(x) for x in sess.scalars(stmt).fetchall()]
return sess.execute(stmt).fetchall() or []
inside the qyery method of the Collection class.
This syntax is not accepted by my db at least this version Postgres v8
To Reproduce
Create a collection and upserted some documents in a Postgres DB with an adapter and also index the tale with HNSW like this:
docs = vx.get_or_create_collection(
name="my_embeddings",
adapter=embedding_adapter,
)
docs.upsert(records=records)
docs.create_index(
method=IndexMethod.hnsw,
measure=IndexMeasure.cosine_distance,
index_arguments=IndexArgsHNSW(m=16, ef_construction=64),
)
Then in a separate script re-access the collection and query it:
collection = vx.get_or_create_collection(
name="my_embeddings",
adapter=embedding_adapter,
)
results = collection.query(
data=query,
limit=top_k,
ef_search=200, # HNSW parameter
skip_adapter=False, # use adapter to convert text -> vector
include_metadata=True,
include_value=True,
)
using the same adapter.
You should get this error back from my DB if the ivfflat has not be set as indexing method :
sqlalchemy.exc.DatabaseError: (pg8000.exceptions.DatabaseError) {'S': 'ERROR', 'V': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "$1"', 'P': '28', 'F': 'scan.l', 'L': '1240', 'R': 'scanner_yyerror'}
[SQL: set local ivfflat.probes = %s::INTEGER]
[parameters: (10,)]
Update When I monkey patched the method like this:
with self.client.Session() as sess:
with sess.begin():
# index ignored if greater than n_lists
sess.execute(text(f"set local ivfflat.probes = {probes}"))
if self.client._supports_hnsw():
sess.execute(text(f"set local hnsw.ef_search = {ef_search}"))
if len(cols) == 1:
return [str(x) for x in sess.scalars(stmt).fetchall()]
return sess.execute(stmt).fetchall() or []
it worked it should be an easy fix
Expected behavior
Just for the query to work with my indexing and return the results. My modified code fixes the issue.
Screenshots
If applicable, add screenshots to help explain your problem.
System information
- OS: WSL
- Version of supabase-js: vecs=0.4.5
Additional context
Add any other context about the problem here.
Hi!
I've played around with it for a bit locally and interestingly enough I've discovered that your bug 100% reproduces with pg8000, but what's more interesting is that psycopg2 apparently automatically works around that Postgres limitation regarding SET parameterized queries, thus masking the issue. pg8000 seems to not have that mechanism
in setup.py i see psycopg which kiiinda confirms my assumptions, it makes sense none of the tests caught it
REQUIRES = [
"pgvector==0.3.*",
"sqlalchemy==2.*",
"psycopg2-binary==2.9.*", <--this specifically
"flupy==1.*",
"deprecated==1.2.*",
]
I'm curious if the solution to your initial issue could be to just switch the driver without any code modifications. Great catch nonetheless! P.S. this all probably belongs in the vecs repo
Cheers!
Hi @peekknuf thanks for checking it!
I ll try changing the dependency for now.
But yeah it would be great if this could be resolved. We plan to use vecs in a new project and we run onto two bugs and some much needed improvement two that I have opened tickets for. Unless this is fixed we cannot really proceed with it so thanks!
Can I help with this can I contribute my fix in a PR maybe?
Move from the supabase repo.