haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Bug/Workaround/feature proposal : DBString Parsing in PgvectorDocumentStore

Open RieraLea opened this issue 1 year ago • 0 comments

Issue with Connection String Parsing in PgvectorDocumentStore

Description:

Hi,

I encountered a problem connecting to an Azure Cosmos DB using PgvectorDocumentStore from the haystack_integrations.document_stores.pgvector module. The issue seems to stem from the inability of psycopg2 (and the underlying C library libpq) to correctly parse my connection string.

Steps to Reproduce:

  1. Using the following code:

    from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
    import os 
    
    os.environ['PG_CONN_STR'] = "postgresql://USR:PWD@c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com:5432/data?sslmode=require"
    
    document_store = PgvectorDocumentStore(
        table_name="docs_embeding",
        embedding_dimension=1024,
        vector_function="cosine_similarity",
        search_strategy="hnsw",
    )
    
    document_store.count_documents()
    

    I receive a Name or service not known error (which I verified is not due to a network issue).

  2. I identified the issue as a problem with the connection string parsing in psycopg2. Testing with the following code:

    import psycopg2
    
    def test_connection_str():
        try: 
            conn_url = "postgresql://USR:PWD@c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com:5432/data?sslmode=require"
            connection = psycopg2.connect(conn_url)
            print("Connection string successful!")
        except: 
            print("Connection string failed!")
    
    def test_connection():
        try: 
            connection = psycopg2.connect(
                host="c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com",
                port="5432",
                dbname="data",
                user="USR",
                password="PWD",
                sslmode="require"
            )
            print("Connection successful!")
        except: 
            print("Connection failed!")
    
    if __name__ == "__main__":
        test_connection()
        test_connection_str()
    

    The results are:

    Connection string failed!
    Connection successful!
    

Workaround:

I was able to bypass the issue by switching to a different connection string format:

    conn_url = "host=c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com port=5432 dbname=data user=USR password=PWD sslmode=require"

This code works as expected:

    import os 

    os.environ['PG_CONN_STR'] = "host=c-cosmos-datarelfrdev01.jiivzschmkljcv.postgres.cosmos.azure.com port=5432 dbname=data user=USR password=PWD sslmode=require"

    document_store = PgvectorDocumentStore(
        table_name="docs_embeding",
        embedding_dimension=1024,
        vector_function="cosine_similarity",
        search_strategy="hnsw",
    )

    document_store.count_documents()

Suggestion:

I suggest implementing connection handling through individual arguments rather than relying solely on a connection string format. This would enhance compatibility and flexibility.

Environment:

  • Haystack Version: 2.3.1
  • Integration Version: 0.5.1

RieraLea avatar Jul 30 '24 12:07 RieraLea