haystack-core-integrations
haystack-core-integrations copied to clipboard
MongoDBAtlasDocumentStore doesn't recognize/use my connection string when creating MongoClient
Describe the bug
When I try to create a MongoDBAtlasDocumentStore, specifying my personal mongo connection string via environment variable os.environ["MONGO_CONNECTION_STRING"] = "mongodb+srv://scooter4j:[email protected]/?retryWrites=true&w=majority&appName=cluster0", I'm unable to establish a connection to my mongo db instance. Instead, I get the following error:
pymongo.errors.ServerSelectionTimeoutError: ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 66a6d4661a19d4beecec5a30, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>, <ServerDescription ('ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>, <ServerDescription ('ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]> python-BaseException
The code sees my personal connection string and sets the resolved_connection_string to my conneciton string, but this string isn't used when creating the MongoClient connection. See image below:
To Reproduce My code: `import os from InstructorEmbedding import INSTRUCTOR from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
os.environ["MONGO_CONNECTION_STRING"] = "mongodb+srv://scooter4j:[email protected]/?retryWrites=true&w=majority&appName=cluster0"
model = INSTRUCTOR('hkunlp/instructor-base')
instruction = "Represent the physical fitness paragraph for retrieval:" query = "What days did my right leg have odd sensations?"
query_embedding = model.encode([[instruction,query]])
Initialize the document store
document_store = MongoDBAtlasDocumentStore( database_name="sq_rag_sandbox", collection_name="training_notes", vector_search_index="vector_index", )
print(f"Document store contains {document_store.count_documents()} documents")
retriever = MongoDBAtlasEmbeddingRetriever(document_store=document_store)
example run query
blah = retriever.run(query_embedding=query_embedding[0].tolist())
print("placeholder....")`
Note that I use Instructor for my embeddings, but that's immaterial, really, as the problem comes when trying to connect to the MongoDB Atlas cluster independently of getting the embeddings.
Describe your environment (please complete the following information):
-
OS: [e.g. iOS] MacOS 12.6.7
-
Haystack version:
-
(venv) [Scotts-MacBook-Pro-3]~/code/GenAI_Sandbox> pip freeze | grep haystack chroma-haystack==0.18.0 haystack-ai @ git+https://github.com/deepset-ai/haystack.git@1c53aae8f09acb9866de2af503331865710857eb haystack-bm25==1.0.2 haystack-experimental==0.1.0 mongodb-atlas-haystack==0.4.1
-
Integration version:
I've investigated the issue on my end and was able to set up the connection without any errors. The problem you're encountering might be due to a missing SSL certificate required for the connection. Installing the certifi package could resolve this. You can refer to this post for more details:
PyMongo SSL Certificate Verify Failed.
Additionally, the official documentation offers resources for troubleshooting TLS errors: PyMongo TLS Troubleshooting.
I recommend trying these solutions and letting us know if the issue persists.
I've been down that path.... for my own, personal code I use certifi when setting up the MongoClient, as shown (and I'm able to connect to Mongo Atlas without problem using this code). However, one can not specify the tlsCAFile parameter to pass to the MongoClient when creating a MongoDBAtlasDocumentStore object....
def connect_to_db(mongodb_uri, database, use_tls=True):
if use_tls:
mongodb_client = MongoClient(mongodb_uri, tlsCAFile=certifi.where())
else:
mongodb_client = MongoClient(mongodb_uri)
database = mongodb_client[database]
print("Connected to the MongoDB database!")
return mongodb_client, database
@scooter4j, could you provide any updates on whether this issue has been resolved?
Closing this issue as stale.