langchain icon indicating copy to clipboard operation
langchain copied to clipboard

using nearText or nearVector to request weaviate

Open ZouhairElhadi opened this issue 1 year ago • 5 comments

I updated the class Weavaite in the module weaviate, by:

  • Adding an attribute by_text, takes the value True by default, if this attribute is true, we can use the nearText to request data from weaviate, else we use nearVector.
  • I changed the name of similarity_search to similarity_search_by_text, which using nearText
  • I added a new method similarity_search, which checks the attribute by_text and based on its value we can using nearText or nearVector

ZouhairElhadi avatar May 08 '23 22:05 ZouhairElhadi

hey @ZouhairElhadi, thanks for contribution! what's the use case you're imagining? why have an attribute on the VectorStore that determines which similarity search to use versus just letting your application call one or the other method?

dev2049 avatar May 09 '23 00:05 dev2049

hey @ZouhairElhadi, thanks for contribution! what's the use case you're imagining? why have an attribute on the VectorStore that determines which similarity search to use versus just letting your application call one or the other method?

While attempting to utilize data stored in Weaviate using Haystack with a schema that includes the attribute "vectorizer":"none", an error occurred. The error message stated: "ValueError: Error during query: [{'locations': [{'column': 24, 'line': 1}], 'message': 'Unknown argument "nearText" on field "Document" of type "GetObjectsObj". Did you mean "nearVector" or "nearObject"?', 'path': None}]." To determine which method to use for similarity calculations, we require the "by_text" attribute. By default, this attribute is set to "true," meaning that previous work will not be affected.

ZouhairElhadi avatar May 09 '23 13:05 ZouhairElhadi

hey @ZouhairElhadi, thanks for contribution! what's the use case you're imagining? why have an attribute on the VectorStore that determines which similarity search to use versus just letting your application call one or the other method?

While attempting to utilize data stored in Weaviate using Haystack with a schema that includes the attribute "vectorizer":"none", an error occurred. The error message stated: "ValueError: Error during query: [{'locations': [{'column': 24, 'line': 1}], 'message': 'Unknown argument "nearText" on field "Document" of type "GetObjectsObj". Did you mean "nearVector" or "nearObject"?', 'path': None}]." To determine which method to use for similarity calculations, we require the "by_text" attribute. By default, this attribute is set to "true," meaning that previous work will not be affected.

can you clarify more how you are using? this doesn't sound like a super typical use case and we dont want to bloat the interface

hwchase17 avatar May 15 '23 02:05 hwchase17

I have a similar use case too. In order to use the Weaviate internal embedding model, you must define one in the schema like in the example.

schema = {
    "classes": [
        {
            "class": "Paragraph",
            "description": "A written paragraph",
            "vectorizer": "text2vec-openai",
              "moduleConfig": {
                "text2vec-openai": {
                  "model": "ada",
                  "modelVersion": "002",
                  "type": "text"
                }
              },
            "properties": [
                {
                    "dataType": ["text"],
                    "description": "The content of the paragraph",
                    "moduleConfig": {
                        "text2vec-openai": {
                          "skip": False,
                          "vectorizePropertyName": False
                        }
                      },
                    "name": "content",
                },
            ],
        },
    ]
}

client.schema.create(schema)

If you do not specify vectorizer, it will not allow you to use nearText. In my case, I want to "Bring my own vectorizer" instead of going through weaviate and specifying it in the weaviate schema, so I would love to be able to use my own embedding function in the similarity_search

I think a good behavior could be that the one in the schema can be the default. However, if one isn't set, you can use a provided embedding function to do the job too, all within the same similarity_search function, without needing an additional variable or creating a new function that could be confusing

khu834 avatar May 15 '23 03:05 khu834

one comment, otherwise think it looks good. @hsm207 @hwchase17 do we think this resolves #4742

@dev2049 yes, this resolves #4742

hsm207 avatar May 16 '23 12:05 hsm207

refactored slightly and landed in #4824. thanks @ZouhairElhadi!

dev2049 avatar May 17 '23 02:05 dev2049