chatgpt-retrieval-plugin
chatgpt-retrieval-plugin copied to clipboard
[#78] Added OpenSearch as Vector Datastore
Description
This adds vector datastore support for OpenSearch, an open-source embedding database. The OpenSearch provides Vector data base capabilities using K-NN plugin: https://opensearch.org/docs/latest/search-plugins/knn/index/
Things tackled in the PR:
- Added support for connecting to various OpenSearch cluster like Self hosted, unsecure cluster and Amazon OpenSearch.
- Provided default index creation path to easily connect to OpenSearch.
- Added support for nmslib and faiss engine, which can provide low latency k-NN search.
Issue
Resolves #78.
@jordanparker6 can you please review this PR.
Testing
- [X] Tested via providing a local docker container for opensearch.
- [X] Tested all apis.
@isafulf can you please review this PR?
@navneet1v Nice ! I was looking for this and came across your PR.
I would like to propose making k-NN search as an optional feature that could be controlled via env var flag. In some cases, depending on the data, native opensearch (BM25) is better than vector search.
@navneet1v Nice ! I was looking for this and came across your PR.
I would like to propose making k-NN search as an optional feature that could be controlled via env var flag. In some cases, depending on the data, native opensearch (BM25) is better than vector search.
I think this is a fair ask. I would attach a env variable which will have a values k_nn, keyword_search and hybrid. This will control what type of search we are doing. k_nn means k-nn search keyword_search means bm-25 search hybrid means both. We will use a bool query and put both k_nn and bm-25 in 1 query.
Please let me know your thoughts. @Venkat2811
@navneet1v Thanks for your quick response. Your approach sounds good 👍🏽
@navneet1v Thanks for your quick response. Your approach sounds good 👍🏽
updated the PR to include all 3 types of search.
@Venkat2811 @isafulf could you please help review the PR? We would love to unblock this for OpenSearch community users.
@vamshin @navneet1v I'm not from OpenAI / maintainer of this repo. I'm an OpenSearch community user as well and was trying out your change while pending review. (Approved to not be blocked from my side)
@isafulf can you please review the PR.
@navneet1v I have been trying to use the Opensearch datastore and have been running into some issues. I have a couple of questions:
- Do I need to install the knn plugin?
- If plugin is needed, do you have the steps required to build a Docker image that includes the knn plugin?
Following is the error I am seeing when the init method tries to setup and validate the data store.
File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 83, in init await data_store.__setup_and_validate_data_store() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 224, in __setup_and_validate_data_store await self.__check_knn_plugin_present() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 229, in __check_knn_plugin_present response = str(await self.async_client.cat.plugins()) File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/client/cat.py", line 546, in plugins return await self.transport.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 410, in perform_request raise e File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 374, in perform_request status, headers_response, data = await connection.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 317, in perform_request raise ConnectionError("N/A", str(e), e) opensearchpy.exceptions.ConnectionError: ConnectionError(Server disconnected) caused by: ServerDisconnectedError(Server disconnected)
Thanks, Marcelo
@navneet1v I have been trying to use the Opensearch datastore and have been running into some issues. I have a couple of questions:
- Do I need to install the knn plugin?
- If plugin is needed, do you have the steps required to build a Docker image that includes the knn plugin?
Following is the error I am seeing when the init method tries to setup and validate the data store.
File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 83, in init await data_store.__setup_and_validate_data_store() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 224, in __setup_and_validate_data_store await self.__check_knn_plugin_present() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 229, in __check_knn_plugin_present response = str(await self.async_client.cat.plugins()) File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/client/cat.py", line 546, in plugins return await self.transport.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 410, in perform_request raise e File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 374, in perform_request status, headers_response, data = await connection.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 317, in perform_request raise ConnectionError("N/A", str(e), e) opensearchpy.exceptions.ConnectionError: ConnectionError(Server disconnected) caused by: ServerDisconnectedError(Server disconnected)
Thanks, Marcelo
@magallardo
Yes you need to install the k-NN plugin, if you have setup the OpenSearch using min distribution of OpenSearch. The k-NN plugin provides the capability to do vector search.
You can download the latest version of OpenSearch with the required plugin from here: https://opensearch.org/downloads.html It also provide how to setup opensearch using Docker.
Moreover I have added 2 example docker file in this PR. https://github.com/openai/chatgpt-retrieval-plugin/blob/fbbd13874d50187c60a72244e72008062291dfd4/examples/docker/opensearch/docker-compose.yml
Docker hub link: https://hub.docker.com/r/opensearchproject/opensearch
@adam-openai, @jordanparker6 can your team take a look on this PR?
Hey @bchess,
Would you mind taking a look at this, please?
@bchess pinging on the this thread again.
@bchess can I get a review on this PR?
@isafulf can I get a review on this PR. it has been pending from months.