azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

[Cosmos][VectorIndex]Adding changes for vectorIndex and vectorEmbeddingPolicy

Open aayush3011 opened this issue 11 months ago • 1 comments

Description

This PR adds vector indexes and vector embedding policy to enable Vector Similarity Search in Cosmos DB.

  • Vector Embedding Policy is the part of the Container level policy, which has the details about: embedding dimensions, data type of the embeddings, and the distance function to be uses to calculate the similarity between the embedding vectors.

  • Adding a new Vector Indexes in Indexing policy similar to composite or spatial indexes. With the specified indexed path, it also specifies the index type.

Sample for the new Cosmos Container Properties with Indexing Policy and Vector Embedding Policy.

{
    "id": "1d14c70a-e60e-489a-afbc-bf3193fae4b9",
    "vectorEmbeddingPolicy": {
        "vectorEmbeddings": [
            {
                "path": "/vector1",
                "dataType": "float32",
                "dimensions": 3,
                "distanceFunction": "cosine"
            },
            {
                "path": "/vector2",
                "dataType": "int8",
                "dimensions": 3,
                "distanceFunction": "dotproduct"
            },
            {
                "path": "/vector3",
                "dataType": "uint8",
                "dimensions": 3,
                "distanceFunction": "euclidean"
            }
        ]
    },
    "partitionKey": {
        "kind": "Hash",
        "paths": [
            "/mypk"
        ]
    },
    "indexingPolicy": {
        "automatic": true,
        "indexingMode": "Consistent",
        "vectorIndexes": [
            {
                "type": "flat",
                "path": "/vector1"
            },
            {
                "type": "quantizedFlat",
                "path": "/vector2"
            },
            {
                "type": "diskANN",
                "path": "/vector3"
            }
        ],
        "includedPaths": [
            {
                "path": "/name/?"
            },
            {
                "path": "/description/?"
            }
        ],
        "excludedPaths": [
            {
                "path": "/*"
            }
        ]
    },
    "uniqueKeyPolicy": {}
}

All SDK Contribution checklist:

  • [x] The pull request does not introduce [breaking changes]
  • [x] CHANGELOG is updated for new features, bug fixes or other significant changes.
  • [x] I have read the contribution guidelines.

General Guidelines and Best Practices

  • [ ] Title of the pull request is clear and informative.
  • [ ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • [ ] Pull request includes test coverage for the included changes.

aayush3011 avatar Mar 22 '24 22:03 aayush3011

API change check

APIView has identified API level changes in this PR and created following API reviews.

com.azure:azure-cosmos

azure-sdk avatar Apr 27 '24 16:04 azure-sdk

/azp run java - cosmos - tests

aayush3011 avatar May 15 '24 16:05 aayush3011

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar May 15 '24 16:05 azure-pipelines[bot]

/check-enforcer override

kushagraThapar avatar May 15 '24 21:05 kushagraThapar