langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Increase flexibility of ElasticVectorSearch

Open mertkayhan opened this issue 1 year ago β€’ 8 comments

Hey @rlancemartin, @eyurtsev ,

I did some minimal changes to the ElasticVectorSearch client so that it plays better with existing ES indices.

Main changes are as follows:

  1. You can pass the dense vector field name into _default_script_query
  2. You can pass a custom script query implementation and the respective parameters to similarity_search_with_score
  3. You can pass functions for building page content and metadata for the resulting Document

mertkayhan avatar Jun 28 '23 09:06 mertkayhan

The latest updates on your projects. Learn more about Vercel for Git β†—οΈŽ

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 12, 2023 2:26pm

vercel[bot] avatar Jun 28 '23 09:06 vercel[bot]

cc @jeffvestal

dev2049 avatar Jun 29 '23 19:06 dev2049

Thanks! Is it worth adding any examples of this functionality to the notebook? And I welcome @jeffvestal's review, too.

Also please run make format to resolve Lint errors.

rlancemartin avatar Jun 29 '23 22:06 rlancemartin

Adding any examples you have to the notebook would be very helpful! I'll try to take a look over the code this weekend

jeffvestal avatar Jun 30 '23 13:06 jeffvestal

@rlancemartin @jeffvestal I've added one example and ran black over the file. Pls let me know if the example is making sense

mertkayhan avatar Jul 03 '23 14:07 mertkayhan

Thanks! Please run "make format" to resolve Lint errors.

rlancemartin avatar Jul 05 '23 00:07 rlancemartin

@mertkayhan can you please resolve:

poetry run ruff .
langchain/vectorstores/elastic_vector_search.py:48:89: E501 Line too long (89 > 88 characters)

I probably can't push to your branch since fsn_capital is corporate to fix myself.

rlancemartin avatar Jul 10 '23 15:07 rlancemartin

I will be able to resolve it Thursday evening (CET). Unfortunately, I don’t got access to my laptop right now, apologies for the inconvenience

mertkayhan avatar Jul 10 '23 16:07 mertkayhan

See lint errors.

You can run make format or the below locally to test before pushing:

poetry run black .
poetry run mypy . 
poetry run ruff . --fix
poetry run pytest --disable-socket --allow-unix-socket tests/unit_tests/

rlancemartin avatar Jul 14 '23 05:07 rlancemartin

This time it seems to run for me locally on my macbook (although since greenlet fails to install so I ended up manually installing the deps), if this attempt also fails I can move the modifications to a public repo and you can lint it

mertkayhan avatar Jul 14 '23 07:07 mertkayhan

@mertkayhan Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!

leo-gan avatar Sep 18 '23 23:09 leo-gan

Thanks @leo-gan , I just updated the branch. Also added a similar functionality to the new ElasticsearchStore.

mertkayhan avatar Sep 19 '23 11:09 mertkayhan

cc @joemcelroy

baskaryan avatar Oct 13 '23 20:10 baskaryan

thanks for the contribution!

Suggestion: rather than having two functions, have one function that a developer can optionally pass in that transforms the elasticsearch document and returns a langchain document? This makes it a little simpler to use and code maintain.

Could we also have tests here please? Could we also only update ElasticsearchStore only, this is the preferred way to use Elasticsearch with langchain.

joemcelroy avatar Oct 16 '23 12:10 joemcelroy

@joemcelroy Good idea! Okay, will only update ElasticsearchStore and add tests as well

mertkayhan avatar Oct 17 '23 11:10 mertkayhan

Hey @mertkayhan ! Any chance you're still working on this? Happy to close for now and reopen when the comments are addressed :)

efriis avatar Nov 10 '23 23:11 efriis

Hey @efriis, I just pushed the requested changes. I will add the test tomorrow :)

mertkayhan avatar Nov 11 '23 08:11 mertkayhan

I noticed one thing while running the linter: langchain/vectorstores/elasticsearch.py:783: error: Argument "source" to "search" of "Elasticsearch" has incompatible type "List[str]"; expected "Union[bool, Mapping[str, Any], None]" [arg-type]

Looking at the client's search imlementation

def search(
      ...
      source: t.Optional[t.Union[bool, t.Mapping[str, t.Any]]] = None,
      ...
      source_includes: t.Optional[t.Union[str, t.Sequence[str]]] = None,
      ...
) -> ObjectApiResponse[t.Any]:

wouldn't it make sense to use the source_includes parameter to select fields to include in the response? Plus, I think it would also make mypy happy.

mertkayhan avatar Nov 12 '23 14:11 mertkayhan

And actually let's do that in a followup PR if it seems right. We're merging in some linting/formatting changes for notebooks this week, so want to get this in before then!

efriis avatar Nov 13 '23 22:11 efriis

This looks great BTW and sorry missed this. Thank you for your contribution!

joemcelroy avatar Nov 17 '23 12:11 joemcelroy