ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

Is there a plan for supporting FGAC?

Open nihakue opened this issue 6 months ago • 5 comments

Hi team,

Pretty exciting stuff, but I didn't hear anything in the podcast / announcement about how catalog owners are expected to handle access control? For example, row-based, column-based, and FGAC. These are a hugely complex part of the current LakeHouse ecosystems, and I wondered if your team had a take on how this could be supported by Ducklake (the implementation or ideally the standard). In particular, I don't love the idea of having to manage access in two places (the catalog db and the directory) using two very different access control dialects.

Compare with what's out there in the LakeHouse ecosystem (Snowflake DAC/RBAC/UBAC, Lakeformation's ABAC, RBAC, and others), is DuckLake going to try and provide an abstraction over access control?

nihakue avatar May 28 '25 15:05 nihakue

I see that in the FAQ there is a suggestion to leverage for example the row level security features of the metadata catalog db. It may be feasible to filter the rows in the COLULMNS table in the ducklake metadata db in postgres to limit the visibility of certain columns to a particular user?

Not yet tried this, but perhaps it's possible to (if the use-case permits this) to partition the data set permissions based on the file_partition_value table and on the specific partition_value. One use case we have would certainly permit partitioning based on a certain user. It wouldn't cover all scenarios, but perhaps an idea to try.

Lewenhaupt avatar Jun 05 '25 09:06 Lewenhaupt

FYI https://ducklake.select/docs/preview/duckdb/guides/access_control. Going deeper than this is out of the scope of an Open Table format (or Lakehouse format). This functionality usually resides in an external service (for example Unity Catalog or similar)

guillesd avatar Sep 12 '25 13:09 guillesd

@guillesd I believe that the whole selling point of DuckLake is that it goes beyond classic Open Table Formats, acknowledges that you need a catalog anyways and unifies both concepts. It already has many responsibilities that usually reside in an external service/catalog. What is the point of drawing the line at access management and needing to fragment tooling?

biellls avatar Sep 24 '25 10:09 biellls

@biellls I think DuckLake has many selling points, some that come to mind:

  • Simplicity: metadata stored in a db with a simple star schema
  • Speed: avoid unnecessary metadata file IO (particularly when metadata volume grows a lot)
  • Streaming: Speed of updates in DuckLake is considerably larger due to the use of the db catalog and the data inlining feature.

However, to implement good RBAC (or ABAC) I can only think that this is possible if you build a service on top of DuckLake that is able to assign fine-grained permissions at different levels (user, group) and has some sort of auth mechanism. This is definitely out of the scope of this project. We welcome people to build this, and I'm in fact pretty sure that doing this with Arrow Flight can yield wonderful results. But this is not something that we are thinking of building right now.

guillesd avatar Sep 24 '25 10:09 guillesd