Description

Support Tencent Vector DB

dependencies

tcvectordb==1.3.2

Type of Change

Please delete options that are not relevant.

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update, included: Dify Document
[ ] Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
[ ] Dependency upgrade

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

[ ] TODO

Suggested Checklist:

[ ] I have performed a self-review of my own code
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] My changes generate no new warnings
[ ] I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
[ ] optional I have made corresponding changes to the documentation
[ ] optional I have added tests that prove my fix is effective or that my feature works
[ ] optional New and existing unit tests pass locally with my changes

Apr 17 '24 11:04 quicksandznzn

-1 for supporting tencent vdb

tencent vdb , or Tencent Cloud VectorDB (https://cloud.tencent.com/product/vdb), is not an open-sourced vector db , which leads to less testability to test against the target vdb instance. The code will easily come to an idle status.
missing required tcvectordb python package in requirements.txt.
the package tcvectordb provides no information for usage and requirements on Pypl public repo, according to https://pypi.org/project/tcvectordb/
never put .env file to the PR

Apr 17 '24 13:04 bowenliang123

how is it going?

Apr 25 '24 07:04 wade30822

how is it going?

wait review ~

Apr 25 '24 07:04 quicksandznzn

please resolve the sytle violation in Python code by running dev/reformat.
move the tests to api/tests/integration_tests/vdb/tcvectordb

Apr 25 '24 08:04 bowenliang123

please resolve the sytle violation in Python code by running dev/reformat.

move the tests to api/tests/integration_tests/vdb/tcvectordb

done~

Apr 25 '24 08:04 quicksandznzn

ok, thx.

Apr 25 '24 08:04 bowenliang123

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

Apr 25 '24 10:04 zeroameli

@quicksandznzn In the method def search_by_vector(), it return the results without score_threshold filtering, and tcvectordb won't return the score. Score_threshold won't work as expect. For example, you want the agent to ask for more information instead of giving an irrelevant reply. It seems a problem of the sdk.

the sdk is fine , it has returned the score

Apr 25 '24 10:04 JohnJyong

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)

    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

Apr 25 '24 11:04 JohnJyong

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)

    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

thanks , optimized

Apr 26 '24 01:04 quicksandznzn

        score_threshold = kwargs.get("score_threshold", .0) if kwargs.get('score_threshold', .0) else 0.0
        return self._get_search_res(res, score_threshold)

    def _get_search_res(self, res, score_threshold):
        docs = []
        if res is None or len(res) == 0:
            return docs

        for result in res[0]:
            meta = result.get(self.field_metadata)
            if meta is not None:
                meta = json.loads(meta)
            score = 1 - result.get("score")
            if score > score_threshold:
                meta['score'] = score
                doc = Document(page_content=result.get(self.field_text), metadata=meta)
                docs.append(doc)
        return docs

thanks , optimized

Optimized,Refer to your suggestions

Apr 26 '24 02:04 quicksandznzn

@quicksandznzn Hello please add my wechat crazyphage, I will invite. you to our contributors' group

Apr 26 '24 12:04 crazywoola

crazyphage

yep

Apr 28 '24 01:04 quicksandznzn

This branch has conflicts that must be resolved , pls fix it , thanks @quicksandznzn

Apr 29 '24 07:04 JohnJyong

@quicksandznzn I found some problems: https://github.com/langgenius/dify/blob/a591366f727da7d6ae3612aaec678fe081e84e11/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L158-L161

param limit is needed when query with filter, why not use delete with filter.
fields in metadata should be indexed if we need to filter by them.

The self.collection won't be initialized in multithread because of redis lock (For example, create a dataset), why not use self._db.collection(self._collection_name)

May 26 '24 12:05 zeroameli

it takes so long~~ 😭

Jun 07 '24 09:06 wade30822

dify
dify copied to clipboard

feat: support tencent vector db

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

dify dify copied to clipboard

feat: support tencent vector db

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

dify
dify copied to clipboard