mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

Issue in docs related to supported vector databases in databricks section

Open rahul-anand1 opened this issue 2 months ago • 2 comments

Path: /components/vectordbs/dbs/databricks

In the documentation the way to use databricks as a vector store is given like this:

config = {
    "vector_store": {
        "provider": "databricks",
        "config": {
            "workspace_url": "https://your-workspace.databricks.com",
            "access_token": "your-access-token",
            "endpoint_name": "your-vector-search-endpoint",
            "index_name": "catalog.schema.index_name",
            "source_table_name": "catalog.schema.source_table",
            "embedding_dimension": 1536
        }
    }
}

With the latest release of mem0 version 1.0.0 it supports databricks but since the documentation is not correct I am getting this error:

validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) pydantic_core._pydantic_core.ValidationError: 1 validation error for MemoryConfig vector_store Value error, Extra fields not allowed: source_table_name, index_name. Please input only the following fields: index_type, pipeline_type, warehouse_name, query_type, workspace_url, endpoint_name, azure_client_id, client_secret, collection_name, embedding_model_endpoint_name, azure_client_secret, catalog, embedding_dimension, endpoint_type, client_id, table_name, schema, access_token [type=value_error, input_value={'workspace_url': 'https:...edding_dimension': 1536}, input_type=dict] For further information visit https://errors.pydantic.dev/2.12/v/value_error {"pid": 17559, "job_id": "AJ_g7dLw5nNYaAa"}

On further checking mem0/vector_stores/databricks.py I cannot find the variable source_table_name What I can see is only these variables are supported:

workspace_url (str): Databricks workspace URL.
access_token (str, optional): Personal access token for authentication.
client_id (str, optional): Service principal client ID for authentication.
client_secret (str, optional): Service principal client secret for authentication.
azure_client_id (str, optional): Azure AD application client ID (for Azure Databricks).
azure_client_secret (str, optional): Azure AD application client secret (for Azure Databricks).
endpoint_name (str): Vector search endpoint name.
catalog (str): Unity Catalog catalog name.
schema (str): Unity Catalog schema name.
table_name (str): Source Delta table name.
index_name (str, optional): Vector search index name (default: "mem0").
index_type (str, optional): Index type, either "DELTA_SYNC" or "DIRECT_ACCESS" (default: "DELTA_SYNC").
embedding_model_endpoint_name (str, optional): Embedding model endpoint for Databricks-computed embeddings.
embedding_dimension (int, optional): Vector embedding dimensions (default: 1536).
endpoint_type (str, optional): Endpoint type, either "STANDARD" or "STORAGE_OPTIMIZED" (default: "STANDARD").
pipeline_type (str, optional): Sync pipeline type, either "TRIGGERED" or "CONTINUOUS" (default: "TRIGGERED").
warehouse_name (str, optional): Databricks SQL warehouse Name (if using SQL warehouse).
query_type (str, optional): Query type, either "ANN" or "HYBRID" (default: "ANN").

rahul-anand1 avatar Oct 21 '25 07:10 rahul-anand1

There is one more issue in code mem0/vector_stores/databricks.py

class Databricks(VectorStoreBase):
    def __init__(
        self,
        workspace_url: str,
        access_token: Optional[str] = None,
        client_id: Optional[str] = None,
        client_secret: Optional[str] = None,
        azure_client_id: Optional[str] = None,
        azure_client_secret: Optional[str] = None,
        endpoint_name: str = None,
        catalog: str = None,
        schema: str = None,
        table_name: str = None,
        collection_name: str = "mem0",
        index_type: str = "DELTA_SYNC",
        embedding_model_endpoint_name: Optional[str] = None,
        embedding_dimension: int = 1536,
        endpoint_type: str = "STANDARD",
        pipeline_type: str = "TRIGGERED",
        warehouse_name: Optional[str] = None,
        query_type: str = "ANN",
    ):
        """
        Initialize the Databricks Vector Search vector store.

        Args:
            workspace_url (str): Databricks workspace URL.
            access_token (str, optional): Personal access token for authentication.
            client_id (str, optional): Service principal client ID for authentication.
            client_secret (str, optional): Service principal client secret for authentication.
            azure_client_id (str, optional): Azure AD application client ID (for Azure Databricks).
            azure_client_secret (str, optional): Azure AD application client secret (for Azure Databricks).
            endpoint_name (str): Vector search endpoint name.
            catalog (str): Unity Catalog catalog name.
            schema (str): Unity Catalog schema name.
            table_name (str): Source Delta table name.
            index_name (str, optional): Vector search index name (default: "mem0").
            index_type (str, optional): Index type, either "DELTA_SYNC" or "DIRECT_ACCESS" (default: "DELTA_SYNC").
            embedding_model_endpoint_name (str, optional): Embedding model endpoint for Databricks-computed embeddings.
            embedding_dimension (int, optional): Vector embedding dimensions (default: 1536).
            endpoint_type (str, optional): Endpoint type, either "STANDARD" or "STORAGE_OPTIMIZED" (default: "STANDARD").
            pipeline_type (str, optional): Sync pipeline type, either "TRIGGERED" or "CONTINUOUS" (default: "TRIGGERED").
            warehouse_name (str, optional): Databricks SQL warehouse Name (if using SQL warehouse).
            query_type (str, optional): Query type, either "ANN" or "HYBRID" (default: "ANN").
        """

Variable expected is collection_name but in the doc-string its given as index_name which in return gives this error:

pydantic_core._pydantic_core.ValidationError: 1 validation error for MemoryConfig
vector_store
  Value error, Extra fields not allowed: index_name. Please input only the following fields: azure_client_id, access_token, schema, pipeline_type, client_id, query_type, warehouse_name, endpoint_name, client_secret, table_name, index_type, embedding_model_endpoint_name, azure_client_secret, embedding_dimension, collection_name, catalog, endpoint_type, workspace_url [type=value_error, input_value={'workspace_url': 'https:... 'query_type': 'HYBRID'}, input_type=dict]

rahul-anand1 avatar Oct 21 '25 08:10 rahul-anand1

Hey @rahul-anand1 thanks for pointing it out, that's a mistake on our documentation. We'll be fixing it on our end, meanwhile feel free to fix and raise a PR, will surely review it.

parshvadaftari avatar Oct 21 '25 20:10 parshvadaftari