haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Problem with trying to use AWS OpenSearch

Open sanjayc2 opened this issue 8 months ago • 2 comments

Hi,

I am trying to create an OpenSearchDocumentStore. I created an AWS OpenSearch domain using my AWS account (using root access to AWS).

I set the hosts argument to OpenSearchDocumentStore as hosts=[{'host': "blah.aos.us-east-1.on.aws", 'port': 443}]. The host is my OpenSearch domain endpoint (I've used blah in place of what's in the actual domain name) in AWS.

My issue is that I don't how to set up the http_auth argument for creating the OpenSearchDocumentStore. From the haystack code, it looks like one could give it an OpenSearch username-password tuple, or AWS authorization.

I decided to go with the AWS authorization, since I do not know which username and password is required. I created an IAM user after logging into my AWS account as root and set up access credentials per AWS instructions. I now have an IAM username, an ARN (which comprised my :user/<IAMusername>), an access key and secret access key. I also changed the security config for my AWS OpenSearch domain to use fine-grained access, set "IAM ARN as master user", and provided my IAM ARN as the value.

Then, per the haystack instructions for OpenSearchDocumentStore, I started docker up on my Windows computer and ran docker with the command (I added the OPENSEARCH_INITIAL_ADMIN_PASSWORD based on the error message I got when I had not included it):

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=<myIAMpassword>" opensearchproject/opensearch:2.17.0

Then I created the OpenSearchDocumentStore by setting http_auth in a few different ways (with AWS4Auth, AWSV4SignerAuth, AWSV4SignerAsyncAuth) using my credentials (access key and secret access key). But whenever I tried to call the file converter in my pipeline, I got an authorization exception, like the below:

Traceback (most recent call last): File "C:\Users\chawl\anaconda3\envs\ragenv\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\chawl\anaconda3\envs\ragenv\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\chawl\rag\ragenv\haystack_rag_docloader.py", line 281, in p.run({"text_file_converter": {"sources": files}}) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack\core\pipeline\pipeline.py", line 247, in run component_outputs = self._run_component(component, inputs, component_visits, parent_span=span) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack\core\pipeline\pipeline.py", line 79, in _run_component component_output = instance.run(**component_inputs) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack\components\writers\document_writer.py", line 102, in run documents_written = self.document_store.write_documents(documents=documents, policy=policy) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 435, in write_documents self._ensure_initialized() File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 264, in _ensure_initialized self._ensure_index_exists() File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 278, in _ensure_index_exists self._client.indices.create(index=self._index, body=body) # type:ignore File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\client\utils.py", line 176, in _wrapped return func(*args, params=params, headers=headers, **kwargs) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\client\indices.py", line 244, in create return self.transport.perform_request( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\transport.py", line 457, in perform_request raise e File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\transport.py", line 418, in perform_request status, headers_response, data = connection.perform_request( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\connection\http_urllib3.py", line 308, in perform_request self._raise_error( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\connection\base.py", line 315, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)( opensearchpy.exceptions.AuthorizationException: AuthorizationException(403, '{"message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical String for this request should have been\n'PUT\n/default\n\nhost:blah.aos.us-east-1.on.aws\nx-amz-date:20250403T124428Z\n\nhost;x-amz-date\n30465717048e6c230725a50d0e269e0e472cdbf7a89dee8e15993ec85b5c9bd7'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\n20250403T124428Z\n20250403/us-east-1/es/aws4_request\nf074151089b8e0dd196c4ee09972e1bee05e826d850c3e78cc69580a9a3ad983'\n"}')

I am not sure how to fix the issue. I have no idea where it gets the expected String-to-Sign, or why the AWS Secret Access Key I set (which I got when I set up myself as the IAM user with access credentials) is not correct.

I also tried setting the http_auth to an AWSAuth() instance, after setting os.environ['AWS_ACCESS_KEY_ID'], os.environ['AWS_SECRET_KEY_ID'], and os.environ['AWS_DEFAULT_REGION']. Again, I did not have an error creating the OpenSearchDocumentStore, but when I called write_documents, I got the error below:

Traceback (most recent call last): File "C:\Users\chawl\anaconda3\envs\ragenv\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\chawl\anaconda3\envs\ragenv\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\chawl\rag\ragenv\haystack_rag_docloader.py", line 200, in document_store.write_documents([ File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 435, in write_documents self._ensure_initialized() File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 264, in _ensure_initialized self._ensure_index_exists() File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\haystack_integrations\document_stores\opensearch\document_store.py", line 269, in _ensure_index_exists if self._client.indices.exists(index=self._index): File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\client\utils.py", line 176, in _wrapped return func(*args, params=params, headers=headers, **kwargs) File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\client\indices.py", line 671, in exists return self.transport.perform_request( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\transport.py", line 457, in perform_request raise e File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\transport.py", line 418, in perform_request status, headers_response, data = connection.perform_request( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\connection\http_urllib3.py", line 308, in perform_request self._raise_error( File "C:\Users\chawl\anaconda3\envs\ragenv\lib\site-packages\opensearchpy\connection\base.py", line 315, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)( opensearchpy.exceptions.AuthorizationException: AuthorizationException(403, '')

At this point, I've spent many days trying to get this to work and read a lot on IAM, setting up access credentials and authorization on AWS and OpenSearch websites, but to no avail. I could not find any documentation on Haystack's github or website that was helpful to resolve the issue.

I would really appreciate your help so I can start to use OpenSearchDocumentStore on my Windows machine, for my RAG project.

Thanks in advance, Sanjay

sanjayc2 avatar Apr 03 '25 22:04 sanjayc2

Hello @sanjayc2 if you run the docker command, it means you are running OpenSearch locally on your own machine. In that case you don't need any AWS account.

If you are just starting with this project, I would recommend to use docker locally. https://opensearch.org/blog/replacing-default-admin-credentials/

I can confirm that the steps listed here still are up to date: https://docs.haystack.deepset.ai/docs/opensearch-document-store#initialization

docker pull opensearchproject/opensearch:2.11.0
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" opensearchproject/opensearch:2.11.0

and in a new python environment do the following after running pip install opensearch-haystack:

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from haystack import Document

document_store = OpenSearchDocumentStore(hosts="http://localhost:9200", use_ssl=True,
verify_certs=False, http_auth=("admin", "admin"))
document_store.write_documents([
    Document(content="This is first"),
    Document(content="This is second")
    ])
print(document_store.count_documents())

Note that starting from OpenSearch version 2.12, there is no default password anymore: https://opensearch.org/blog/replacing-default-admin-credentials/ So the authentication in the example above only works that way with opensearch:2.11 or older.

julian-risch avatar Apr 04 '25 13:04 julian-risch

Thank you very much.

With regards to using a later version (e.g., 2.17) of Opensearch to work locally, will the instructions in https://opensearch.org/blog/replacing-default-admin-credentials/ allow one to do that? If not, how would one need to set up the authentication? (would I have to create an Opensearch account with a username and password?). I ask because the newer version of OpenSearch is much faster and uses less memory (which is a premium on my Windows machine).

Thank you again for your help.

sanjayc2 avatar Apr 05 '25 14:04 sanjayc2