langchain icon indicating copy to clipboard operation
langchain copied to clipboard

in ElasticKnnSearch added back create_index, add_texts, from_texts

Open jeffvestal opened this issue 1 year ago • 1 comments

Fixes https://github.com/hwchase17/langchain/issues/7117

Adding back create_index , add_texts, from_texts to ElasticKnnSearch

Quick Test from langchain.vectorstores.elastic_vector_search import ElasticKnnSearch from langchain.embeddings import ElasticsearchEmbeddings

Initialize ElasticsearchEmbeddings

model_id = "sentence-transformers__all-distilroberta-v1" dims = 768 es_cloud_id = es_user = "" es_password = "" test_index = "knn_test_index_012"

embeddings = ElasticsearchEmbeddings.from_credentials( model_id, es_cloud_id=es_cloud_id, es_user=es_user, es_password=es_password, )

Initialize ElasticKnnSearch

knn_search = ElasticKnnSearch( es_cloud_id=es_cloud_id, es_user=es_user, es_password=es_password, index_name= test_index, embedding= embeddings )

Test adding vectors

Test add_texts method when index is not created

texts = ["Hello, world!", "Machine learning is fun.", "I love Python."] knn_search.add_texts(texts)

Test from_texts method when index is not created

new_texts = ["This is a new text.", "Elasticsearch is powerful.", "Python is fun."] knn_search.from_texts(new_texts, dims=768)

Correctly throw an exception when index has not been previously created.

 # Test `add_texts` method
texts = ["Hello, world!", "Machine learning is fun.", "I love Python."]
knn_search.add_texts(texts)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/runner/langchain-1/langchain/vectorstores/elastic_vector_search.py", line 621, in add_texts
    raise Exception(f"The index '{self.index_name}' does not exist. If you want to create a new index while encoding texts, call 'from_texts' instead.")
Exception: The index 'knn_test_index_012' does not exist. If you want to create a new index while encoding texts, call 'from_texts' instead.

Correctly create new index

# Test `from_texts` method
new_texts = ["This is a new text.", "Elasticsearch is powerful.", "Python is fun."]
knn_search.from_texts(new_texts, dims=768)

The mapping is as follows:

{
  "knn_test_index_012": {
    "mappings": {
      "properties": {
        "text": {
          "type": "text"
        },
        "vector": {
          "type": "dense_vector",
          "dims": 768,
          "index": true,
          "similarity": "dot_product"
        }
      }
    }
  }
}

Correctly index texts after index has been created

knn_search.add_texts(texts)

jeffvestal avatar Jul 04 '23 02:07 jeffvestal

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jul 17, 2023 11:39pm

vercel[bot] avatar Jul 04 '23 02:07 vercel[bot]

@benwtrent do you have time to do another review? I think I addressed all the issues I removed ElasticKnnSearch as a subclass but tried to align the methods to be standard. I also return Document type.

I'm also not sure how I picked up 27 other files that show changing

jeffvestal avatar Jul 14 '23 02:07 jeffvestal

@baskaryan Somehow I picked up 27 other files to change in this PR. Are you able to take a look? It should just be the langchain/vectorstores/elastic_vector_search.py file

jeffvestal avatar Jul 14 '23 15:07 jeffvestal

Langchain repo underwent a large reorg. Closing this PR in favor of https://github.com/langchain-ai/langchain/pull/8180

jeffvestal avatar Jul 24 '23 14:07 jeffvestal