chatgpt-retrieval-plugin icon indicating copy to clipboard operation
chatgpt-retrieval-plugin copied to clipboard

[#78] Added OpenSearch as Vector Datastore

Open navneet1v opened this issue 2 years ago • 15 comments

Description

This adds vector datastore support for OpenSearch, an open-source embedding database. The OpenSearch provides Vector data base capabilities using K-NN plugin: https://opensearch.org/docs/latest/search-plugins/knn/index/

Things tackled in the PR:

  1. Added support for connecting to various OpenSearch cluster like Self hosted, unsecure cluster and Amazon OpenSearch.
  2. Provided default index creation path to easily connect to OpenSearch.
  3. Added support for nmslib and faiss engine, which can provide low latency k-NN search.

Issue

Resolves #78.

@jordanparker6 can you please review this PR.

Testing

  • [X] Tested via providing a local docker container for opensearch.
  • [X] Tested all apis.

navneet1v avatar Apr 04 '23 18:04 navneet1v

@isafulf can you please review this PR?

navneet1v avatar Apr 10 '23 22:04 navneet1v

@navneet1v Nice ! I was looking for this and came across your PR.

I would like to propose making k-NN search as an optional feature that could be controlled via env var flag. In some cases, depending on the data, native opensearch (BM25) is better than vector search.

Venkat2811 avatar Apr 11 '23 10:04 Venkat2811

@navneet1v Nice ! I was looking for this and came across your PR.

I would like to propose making k-NN search as an optional feature that could be controlled via env var flag. In some cases, depending on the data, native opensearch (BM25) is better than vector search.

I think this is a fair ask. I would attach a env variable which will have a values k_nn, keyword_search and hybrid. This will control what type of search we are doing. k_nn means k-nn search keyword_search means bm-25 search hybrid means both. We will use a bool query and put both k_nn and bm-25 in 1 query.

Please let me know your thoughts. @Venkat2811

navneet1v avatar Apr 11 '23 15:04 navneet1v

@navneet1v Thanks for your quick response. Your approach sounds good 👍🏽

Venkat2811 avatar Apr 11 '23 17:04 Venkat2811

@navneet1v Thanks for your quick response. Your approach sounds good 👍🏽

updated the PR to include all 3 types of search.

navneet1v avatar Apr 13 '23 01:04 navneet1v

@Venkat2811 @isafulf could you please help review the PR? We would love to unblock this for OpenSearch community users.

vamshin avatar Apr 17 '23 16:04 vamshin

@vamshin @navneet1v I'm not from OpenAI / maintainer of this repo. I'm an OpenSearch community user as well and was trying out your change while pending review. (Approved to not be blocked from my side)

Venkat2811 avatar Apr 17 '23 17:04 Venkat2811

@isafulf can you please review the PR.

navneet1v avatar Apr 17 '23 18:04 navneet1v

@navneet1v I have been trying to use the Opensearch datastore and have been running into some issues. I have a couple of questions:

  • Do I need to install the knn plugin?
  • If plugin is needed, do you have the steps required to build a Docker image that includes the knn plugin?

Following is the error I am seeing when the init method tries to setup and validate the data store.

File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 83, in init await data_store.__setup_and_validate_data_store() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 224, in __setup_and_validate_data_store await self.__check_knn_plugin_present() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 229, in __check_knn_plugin_present response = str(await self.async_client.cat.plugins()) File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/client/cat.py", line 546, in plugins return await self.transport.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 410, in perform_request raise e File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 374, in perform_request status, headers_response, data = await connection.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 317, in perform_request raise ConnectionError("N/A", str(e), e) opensearchpy.exceptions.ConnectionError: ConnectionError(Server disconnected) caused by: ServerDisconnectedError(Server disconnected)

Thanks, Marcelo

magallardo avatar Apr 21 '23 14:04 magallardo

@navneet1v I have been trying to use the Opensearch datastore and have been running into some issues. I have a couple of questions:

  • Do I need to install the knn plugin?
  • If plugin is needed, do you have the steps required to build a Docker image that includes the knn plugin?

Following is the error I am seeing when the init method tries to setup and validate the data store.

File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 83, in init await data_store.__setup_and_validate_data_store() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 224, in __setup_and_validate_data_store await self.__check_knn_plugin_present() File "/home/test/retrieval-plugin/datastore/providers/opensearch_datastore.py", line 229, in __check_knn_plugin_present response = str(await self.async_client.cat.plugins()) File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/client/cat.py", line 546, in plugins return await self.transport.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 410, in perform_request raise e File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 374, in perform_request status, headers_response, data = await connection.perform_request( File "/home/test/retrieval-plugin/venv/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 317, in perform_request raise ConnectionError("N/A", str(e), e) opensearchpy.exceptions.ConnectionError: ConnectionError(Server disconnected) caused by: ServerDisconnectedError(Server disconnected)

Thanks, Marcelo

@magallardo

Yes you need to install the k-NN plugin, if you have setup the OpenSearch using min distribution of OpenSearch. The k-NN plugin provides the capability to do vector search.

You can download the latest version of OpenSearch with the required plugin from here: https://opensearch.org/downloads.html It also provide how to setup opensearch using Docker.

Moreover I have added 2 example docker file in this PR. https://github.com/openai/chatgpt-retrieval-plugin/blob/fbbd13874d50187c60a72244e72008062291dfd4/examples/docker/opensearch/docker-compose.yml

Docker hub link: https://hub.docker.com/r/opensearchproject/opensearch

navneet1v avatar Apr 21 '23 16:04 navneet1v

@adam-openai, @jordanparker6 can your team take a look on this PR?

navneet1v avatar May 03 '23 05:05 navneet1v

Hey @bchess,

Would you mind taking a look at this, please?

chris-short avatar May 04 '23 20:05 chris-short

@bchess pinging on the this thread again.

navneet1v avatar Jun 05 '23 21:06 navneet1v

@bchess can I get a review on this PR?

navneet1v avatar Jul 17 '23 23:07 navneet1v

@isafulf can I get a review on this PR. it has been pending from months.

navneet1v avatar Dec 24 '23 20:12 navneet1v