haystack-core-integrations
haystack-core-integrations copied to clipboard
Bug/Workaround/feature proposal : DBString Parsing in PgvectorDocumentStore
Issue with Connection String Parsing in PgvectorDocumentStore
Description:
Hi,
I encountered a problem connecting to an Azure Cosmos DB using PgvectorDocumentStore from the haystack_integrations.document_stores.pgvector module. The issue seems to stem from the inability of psycopg2 (and the underlying C library libpq) to correctly parse my connection string.
Steps to Reproduce:
-
Using the following code:
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore import os os.environ['PG_CONN_STR'] = "postgresql://USR:PWD@c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com:5432/data?sslmode=require" document_store = PgvectorDocumentStore( table_name="docs_embeding", embedding_dimension=1024, vector_function="cosine_similarity", search_strategy="hnsw", ) document_store.count_documents()I receive a
Name or service not knownerror (which I verified is not due to a network issue). -
I identified the issue as a problem with the connection string parsing in
psycopg2. Testing with the following code:import psycopg2 def test_connection_str(): try: conn_url = "postgresql://USR:PWD@c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com:5432/data?sslmode=require" connection = psycopg2.connect(conn_url) print("Connection string successful!") except: print("Connection string failed!") def test_connection(): try: connection = psycopg2.connect( host="c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com", port="5432", dbname="data", user="USR", password="PWD", sslmode="require" ) print("Connection successful!") except: print("Connection failed!") if __name__ == "__main__": test_connection() test_connection_str()The results are:
Connection string failed! Connection successful!
Workaround:
I was able to bypass the issue by switching to a different connection string format:
conn_url = "host=c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com port=5432 dbname=data user=USR password=PWD sslmode=require"
This code works as expected:
import os
os.environ['PG_CONN_STR'] = "host=c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com port=5432 dbname=data user=USR password=PWD sslmode=require"
document_store = PgvectorDocumentStore(
table_name="docs_embeding",
embedding_dimension=1024,
vector_function="cosine_similarity",
search_strategy="hnsw",
)
document_store.count_documents()
Suggestion:
I suggest implementing connection handling through individual arguments rather than relying solely on a connection string format. This would enhance compatibility and flexibility.
Environment:
- Haystack Version: 2.3.1
- Integration Version: 0.5.1