langchain
langchain copied to clipboard
Increase flexibility of ElasticVectorSearch
Hey @rlancemartin, @eyurtsev ,
I did some minimal changes to the ElasticVectorSearch
client so that it plays better with existing ES indices.
Main changes are as follows:
- You can pass the dense vector field name into
_default_script_query
- You can pass a custom script query implementation and the respective parameters to
similarity_search_with_score
- You can pass functions for building page content and metadata for the resulting
Document
The latest updates on your projects. Learn more about Vercel for Git βοΈ
1 Ignored Deployment
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | β¬οΈ Ignored (Inspect) | Visit Preview | Nov 12, 2023 2:26pm |
cc @jeffvestal
Thanks! Is it worth adding any examples of this functionality to the notebook? And I welcome @jeffvestal's review, too.
Also please run make format
to resolve Lint errors.
Adding any examples you have to the notebook would be very helpful! I'll try to take a look over the code this weekend
@rlancemartin @jeffvestal I've added one example and ran black over the file. Pls let me know if the example is making sense
Thanks! Please run "make format" to resolve Lint errors.
@mertkayhan can you please resolve:
poetry run ruff .
langchain/vectorstores/elastic_vector_search.py:48:89: E501 Line too long (89 > 88 characters)
I probably can't push to your branch since fsn_capital
is corporate to fix myself.
I will be able to resolve it Thursday evening (CET). Unfortunately, I donβt got access to my laptop right now, apologies for the inconvenience
See lint errors.
You can run make format
or the below locally to test before pushing:
poetry run black .
poetry run mypy .
poetry run ruff . --fix
poetry run pytest --disable-socket --allow-unix-socket tests/unit_tests/
This time it seems to run for me locally on my macbook (although since greenlet fails to install so I ended up manually installing the deps), if this attempt also fails I can move the modifications to a public repo and you can lint it
@mertkayhan Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!
Thanks @leo-gan , I just updated the branch. Also added a similar functionality to the new ElasticsearchStore
.
cc @joemcelroy
thanks for the contribution!
Suggestion: rather than having two functions, have one function that a developer can optionally pass in that transforms the elasticsearch document and returns a langchain document? This makes it a little simpler to use and code maintain.
Could we also have tests here please? Could we also only update ElasticsearchStore only, this is the preferred way to use Elasticsearch with langchain.
@joemcelroy Good idea! Okay, will only update ElasticsearchStore and add tests as well
Hey @mertkayhan ! Any chance you're still working on this? Happy to close for now and reopen when the comments are addressed :)
Hey @efriis, I just pushed the requested changes. I will add the test tomorrow :)
I noticed one thing while running the linter:
langchain/vectorstores/elasticsearch.py:783: error: Argument "source" to "search" of "Elasticsearch" has incompatible type "List[str]"; expected "Union[bool, Mapping[str, Any], None]" [arg-type]
Looking at the client's search imlementation
def search(
...
source: t.Optional[t.Union[bool, t.Mapping[str, t.Any]]] = None,
...
source_includes: t.Optional[t.Union[str, t.Sequence[str]]] = None,
...
) -> ObjectApiResponse[t.Any]:
wouldn't it make sense to use the source_includes
parameter to select fields to include in the response? Plus, I think it would also make mypy happy.
And actually let's do that in a followup PR if it seems right. We're merging in some linting/formatting changes for notebooks this week, so want to get this in before then!
This looks great BTW and sorry missed this. Thank you for your contribution!