dify
dify copied to clipboard
feat: support tencent vector db
Description
Support Tencent Vector DB
dependencies
tcvectordb==1.3.2
Type of Change
Please delete options that are not relevant.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update, included: Dify Document
- [ ] Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
- [ ] Dependency upgrade
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [ ] TODO
Suggested Checklist:
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] My changes generate no new warnings
- [ ] I ran
dev/reformat
(backend) andcd web && npx lint-staged
(frontend) to appease the lint gods - [ ]
optional
I have made corresponding changes to the documentation - [ ]
optional
I have added tests that prove my fix is effective or that my feature works - [ ]
optional
New and existing unit tests pass locally with my changes
-1 for supporting tencent vdb
- tencent vdb , or
Tencent Cloud VectorDB
(https://cloud.tencent.com/product/vdb), is not an open-sourced vector db , which leads to less testability to test against the target vdb instance. The code will easily come to an idle status. - missing required
tcvectordb
python package in requirements.txt. - the package
tcvectordb
provides no information for usage and requirements on Pypl public repo, according to https://pypi.org/project/tcvectordb/ - never put
.env
file to the PR
how is it going?
how is it going?
wait review ~
- please resolve the sytle violation in Python code by running
dev/reformat
. - move the tests to
api/tests/integration_tests/vdb/tcvectordb
- please resolve the sytle violation in Python code by running
dev/reformat
.- move the tests to
api/tests/integration_tests/vdb/tcvectordb
done~
ok, thx.
@quicksandznzn In the method def search_by_vector()
, it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.
@quicksandznzn In the method
def search_by_vector()
, it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.
@quicksandznzn In the method
def search_by_vector()
, it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.
the sdk is fine , it has returned the score
score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
return self._get_search_res(res, score_threshold)
def _get_search_res(self, res, score_threshold):
docs = []
if res is None or len(res) == 0:
return docs
for result in res[0]:
meta = result.get(self.field_metadata)
if meta is not None:
meta = json.loads(meta)
score = 1 - result.get("score")
if score > score_threshold:
meta['score'] = score
doc = Document(page_content=result.get(self.field_text), metadata=meta)
docs.append(doc)
return docs
score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0 return self._get_search_res(res, score_threshold)
def _get_search_res(self, res, score_threshold): docs = [] if res is None or len(res) == 0: return docs for result in res[0]: meta = result.get(self.field_metadata) if meta is not None: meta = json.loads(meta) score = 1 - result.get("score") if score > score_threshold: meta['score'] = score doc = Document(page_content=result.get(self.field_text), metadata=meta) docs.append(doc) return docs
thanks , optimized
score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0 return self._get_search_res(res, score_threshold)
def _get_search_res(self, res, score_threshold): docs = [] if res is None or len(res) == 0: return docs for result in res[0]: meta = result.get(self.field_metadata) if meta is not None: meta = json.loads(meta) score = 1 - result.get("score") if score > score_threshold: meta['score'] = score doc = Document(page_content=result.get(self.field_text), metadata=meta) docs.append(doc) return docs
thanks , optimized
Optimized,Refer to your suggestions
@quicksandznzn Hello please add my wechat crazyphage, I will invite. you to our contributors' group
crazyphage
yep
This branch has conflicts that must be resolved , pls fix it , thanks @quicksandznzn
@quicksandznzn I found some problems: https://github.com/langgenius/dify/blob/a591366f727da7d6ae3612aaec678fe081e84e11/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L158-L161
- param
limit
is needed when query with filter, why not use delete with filter. - fields in metadata should be indexed if we need to filter by them.
The self.collection won't be initialized in multithread because of redis lock (For example, create a dataset), why not use self._db.collection(self._collection_name)
it takes so long~~ 😭