langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

[Milvus] Store array and JSON metadata fields directly

Open rakuzen25 opened this issue 11 months ago • 3 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain.js documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain.js rather than my code.
  • [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

// Based on https://js.langchain.com/docs/integrations/vectorstores/milvus/
import { Milvus } from "@langchain/community/vectorstores/milvus";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Document } from "langchain/document";

const docs: Document[] = [
    new Document({
        pageContent: "This is a test document.",
        metadata: {
            source: "test.txt",
            foo: {
                bar: "baz",
            },
            qux: [1, 2, 3],
        },
    })
]

const vectorStore = await Milvus.fromDocuments(docs, new OpenAIEmbeddings(), {
    collectionName: "foobar",
});

const response = await vectorStore.similaritySearch("test", 2,
    // This won't work...
    "array_contains(qux, 1)",
    // Only this will
    "qux like '%1%'",
);

Error Message and Stack Trace (if applicable)

No response

Description

Milvus 2.2.9 and 2.3.2, released in June 2023 and October 2023, added support for JSON and array data types respectively. This enables access to more efficient operators such as json_contains and array_contains. However, LangChain's current implementation uses VarChar for all metadata fields:

https://github.com/langchain-ai/langchainjs/blob/45498632ce2f5d539d84d049bf5b6717f674ac46/libs/langchain-community/src/vectorstores/milvus.ts#L772-L784

Is it possible to offer it as an option to the user, or do some magic version detection through MilvusClient.getVersion?

System Info

Not sure how pnpm info langchain would be useful since it always shows the latest version, but my installed versions are:

@langchain/community 0.2.33
langchain 0.2.20

Windows, node v23.4.0, pnpm v9.15.1

rakuzen25 avatar Dec 25 '24 11:12 rakuzen25

Hi, @rakuzen25. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • LangChain.js currently uses VarChar for all metadata fields.
  • Milvus versions 2.2.9 and 2.3.2 support JSON and array data types.
  • You suggested updating LangChain to utilize these data types for more efficient operations like json_contains and array_contains.
  • There have been no comments or activity on this issue yet.

Next Steps:

  • Is this issue still relevant to the latest version of the LangChain JS repository? If so, please comment to keep the discussion open.
  • If there is no further activity, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Mar 26 '25 16:03 dosubot[bot]

Still relevant

rakuzen25 avatar Mar 27 '25 01:03 rakuzen25

@jacoblee93, the user @rakuzen25 has indicated that this issue is still relevant. Could you please assist them with the update to utilize JSON and array data types in LangChain.js?

dosubot[bot] avatar Mar 27 '25 01:03 dosubot[bot]