mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

[bugfix] fix elasticsearch db

Open deven298 opened this issue 1 year ago • 1 comments

Description

This PR includes following changes:

  • It skips embedding data when the content has not changed for the same source.
  • Makes the query work with metadata filtering. For ex -
from embedchain import App

config = {
    "vectordb": {
        "provider": "elasticsearch",
        "config": {
            "collection_name": "ec-test",
            "cloud_id": "xxx",
            "basic_auth": ("elastic", "password"),
            "verify_certs": True,
        },
    }
}

app = App.from_config(config=config)

app.add("https://www.forbes.com/profile/elon-musk", metadata={"source": "forbes"})
app.add("https://en.wikipedia.org/wiki/Elon_Musk", metadata={"source": "wikipedia"})

response, context = app.query("which companies does elon own?", where={"source": "forbes"}, citations=True)
print(response)
print(list(map(lambda x: x[1]["url"], context)))
# The response will be coming from filtered documents where `source==forbes`.

Type of change

Please delete options that are not relevant.

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (does not change functionality, e.g. code style improvements, linting)
  • [ ] Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • [x] Unit Test
  • [x] Test Script

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] New and existing unit tests pass locally with my changes
  • [x] Any dependent changes have been merged and published in downstream modules
  • [x] I have checked my code and corrected any misspellings

Maintainer Checklist

  • [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
  • [x] Made sure Checks passed

deven298 avatar Jan 17 '24 09:01 deven298

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (2784bae) 0.00% compared to head (6ee7719) 57.06%. Report is 1 commits behind head on main.

Files Patch % Lines
embedchain/vectordb/elasticsearch.py 11.11% 16 Missing :warning:
Additional details and impacted files
@@            Coverage Diff            @@
##           main    #1183       +/-   ##
=========================================
+ Coverage      0   57.06%   +57.06%     
=========================================
  Files         0      143      +143     
  Lines         0     5722     +5722     
=========================================
+ Hits          0     3265     +3265     
- Misses        0     2457     +2457     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 17 '24 09:01 codecov[bot]