langchain4j
langchain4j copied to clipboard
feat(langchain4j-milvus): MilvusEmbeddingStore supports configure required index parameters
Context
Added support for configure common custom index parameters
Fix: #860
Change
The constructor of MilvusEmbeddingStore now require a parameter of type IndexParam and provides parameter models required for common type indexes.
Checklist
Before submitting this PR, please check the following points:
- [x] I have added unit and integration tests for my change
- [x] All unit and integration tests in the module I have added/changed are green
- [ ] All unit and integration tests in the core and main modules are green
- [ ] I have added/updated the documentation
- [ ] I have added an example in the examples repo (only for "big" features)
- [ ] I have added my new module in the BOM (only when a new module is added)
Checklist for adding new embedding store integration
- [ ] I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.
@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.
You can use the @EnabledIfEnvironmentVariable
annotation to conditionally enable a test class based on an environment variable. For example:
@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+")
class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT {
// Test class content
}
Additionally, I suggest that MilvusEmbeddingStore
should employ the 'upsert' operation instead of 'insert', similar to the approach taken by PgVector
, Pinecone
, and AstraDb
. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j the EmbeddingStore
interface contains a method public void add(String id, Embedding embedding)
. Would it be possible to include a public void add(String id, Embedding embedding, TextSegment textSegment)
, that enables the upserting of a record along with a text field?
@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.
You can use the
@EnabledIfEnvironmentVariable
annotation to conditionally enable a test class based on an environment variable. For example:@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+") class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT { // Test class content }
Additionally, I suggest that
MilvusEmbeddingStore
should employ the 'upsert' operation instead of 'insert', similar to the approach taken byPgVector
,Pinecone
, andAstraDb
. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j theEmbeddingStore
interface contains a methodpublic void add(String id, Embedding embedding)
. Would it be possible to include apublic void add(String id, Embedding embedding, TextSegment textSegment)
, that enables the upserting of a record along with a text field?
@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.
You can use the
@EnabledIfEnvironmentVariable
annotation to conditionally enable a test class based on an environment variable. For example:@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+") class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT { // Test class content }
Additionally, I suggest that
MilvusEmbeddingStore
should employ the 'upsert' operation instead of 'insert', similar to the approach taken byPgVector
,Pinecone
, andAstraDb
. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j theEmbeddingStore
interface contains a methodpublic void add(String id, Embedding embedding)
. Would it be possible to include apublic void add(String id, Embedding embedding, TextSegment textSegment)
, that enables the upserting of a record along with a text field?
This PR mainly aims to add parameters for Index creation. Collection and index creation are performed within the constructor of MilvusEmbeddingStore
, contingent upon the collection not existing. Without a Milvus instance, unit tests cannot reach the location of this modification. If skipping the test based on conditions using the @EnabledIfEnvironmentVariable
method without a Milvus instance, then this unit test would serve no purpose.
Additionally, regarding modifications to other methods in MilvusEmbeddingStore
, it might be worth opening a separate issue for discussion. In addition to your suggestions, I also propose adding a vector deletion interface based on ID or filters to EmbeddingStore
, enabling CRUD operations on vectors to be performed within EmbeddingStore. For updates, this can be achieved through a combination of deletion and creation, eliminating the need to create a separate client for maintaining this data functionality.
The Milvus instance in unit testing has been adjusted to be provided by @Testcontainers
, and all unit tests have passed. It's worth noting that configuring Milvus to use @Testcontainers
requires additional settings, as shown below.
MilvusContainer milvus = new MilvusContainer("milvusdb/milvus:v2.3.1")
.withCreateContainerCmdModifier(cmd -> {
cmd.withHostConfig(HostConfig.newHostConfig()
// This security opts needs to be configured;
// otherwise, the container will fail to start.
// Refer to the parameters from https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh.
.withSecurityOpts(Collections.singletonList("seccomp=unconfined"))
// By default, the Milvus Docker image doesn't expose the target port.
// Manual binding is required to expose it;
// otherwise, it will result in a failed detection during the test container inspection,
// leading to an inability to connect to the Milvus instance.
.withPortBindings(PortBinding.parse("19530:19530"), PortBinding.parse("9091:9091")
)
);
});
@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.
No, not for the main CI yet. Only for releases: https://github.com/langchain4j/langchain4j/blob/main/.github/workflows/release.yaml#L53