data-api-builder icon indicating copy to clipboard operation
data-api-builder copied to clipboard

[Enhancement] Add Azure Databricks SQL Serverless as a Data API Builder data source

Open henkokors opened this issue 4 months ago • 0 comments

Hi team/ maintainers, I love the work you're doing on Data API Builder. We’ve adopted it in different areas because it gives us a unified, secure, and standards-based API layer on top of multiple data sources. Because of it's strengths, we’d love to see DAB support Azure Databricks SQL Serverless as a first-class data source as well, details below. I wonder what you think!

Summary

Please add a first-class provider for Azure Databricks SQL Serverless so Data API Builder (DAB) can expose REST/GraphQL endpoints over Delta tables and views hosted in Unity Catalog, using read-only operations and row-level security (RLS) comparable to the existing SQL Server/Azure SQL pattern.

Why this matters

Databricks SQL Serverless is widely used to serve governed data from Unity Catalog via JDBC/ODBC with Microsoft Entra ID (service principals and users). Many teams already front their databases with DAB to get consistent REST/GraphQL, policy, and auth. Extending DAB to Databricks lets customers standardize their API layer across Azure SQL, PostgreSQL, MySQL, and Databricks with consistent auth and policy semantics.

Scope (initial MVP)

  • Provider: databricks (read-only)
  • Operations: GET (collection by query, single item by PK/composite key), aggregate with limited $select/$filter/$top/$orderby parity to other SQL providers.
  • Connectivity: JDBC over HTTPS to SQL Warehouse (Serverless) using jdbc:databricks://{host}:443;httpPath=/sql/1.0/warehouses/{id}.
  • Authentication to Databricks: -- Mode A (Service principal / M2M): OAuth 2.0 client credentials → access token presented to JDBC driver (Auth_AccessToken). For app-level access.  -- Mode B (User OBO passthrough): Optional On-Behalf-Of flow: exchange the caller’s Entra ID access token for a Databricks OAuth token and pass it to JDBC. Enables true per-user identity in the warehouse so current_user() and is_account_group_member() work for RLS. 
  • Security: -- Read-only enforcement at provider level (drop/ignore non-GET verbs). -- RLS: recommend Unity Catalog row filters / dynamic views for server-side enforcement. In Mode B, per-user identity is honored; in Mode A, RLS is evaluated for the service principal identity (suitable when dynamic views/filters are group-based and the SP is mapped accordingly). 
  • Hosting: parity with existing DAB hosting models (containers, SWA, Web Apps). 

Row-level security parity

  • Existing DAB (SQL Server/Azure SQL): DAB can project claims into SESSION_CONTEXT and RLS policies consume that. 
  • Proposed Databricks: -- Preferred: OBO passthrough --> UC row filters / dynamic views using current_user() / group membership functions for per-user RLS and masking.  -- Alternative (M2M/SP mode): keep RLS in UC based on groups assigned to the service principal or use dynamic views that reference mapping tables keyed by app identity (fits service-to-service scenarios). 

Open questions for maintainers

  1. Would you accept a new databricks provider implemented via JDBC similar to existing relational providers?
  2. For OBO, is there an existing DAB abstraction we should extend (akin to session-context in SQL) or should Databricks be the first provider with identity passthrough? 
  3. Any guidance on where to plug the Databricks token acquisition so we can cache tokens per user/request to avoid re-auth on every call?
  4. Any appetite to expose a minimal set of $filter/$select/$orderby/$top translated to Databricks SQL, or should MVP keep simple GET all / GET by PK + basic filtering?

Alternatives considered

  • Running DAB over Azure SQL linked servers/external tables that proxy Databricks; rejected because it adds latency, cost, and loses UC-level governance.
  • Custom lightweight API using the Databricks SQL REST API; rejected to avoid fragmenting the API surface compared to other DAB-backed sources.
  • Low-ms, high-QPS entities, Azure SQL DB or PostgreSQL (Lakebase) via DAB’s native postgresql provider works wonders but isn't always the case in Data Platform environments.

Acceptance criteria (MVP)

  • Connect to a SQL Serverless Warehouse via JDBC with OAuth SP and (optionally) OBO.
  • Expose read-only REST/GraphQL endpoints for UC tables/views.
  • Demonstrate RLS with a UC dynamic view or row filter: -- Mode B: user “Alice” and “Bob” see different rows based on UC group. -- Mode A: service principal is granted only the intended slice via UC policy.

henkokors avatar Sep 16 '25 10:09 henkokors