kyuubi
kyuubi copied to clipboard
[Bug][AuthZ] Kyuubi has no permission to access the Iceberg metadata table after integrating Ranger
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the bug
Environment
Spark version:3.2.2 Kyuubi version: apache-kyuubi-1.7.0-SNAPSHOT-bin (master)
./build/dist --tgz --spark-provided --flink-provided -Pspark-3.2
Iceberg version: 0.14.1
wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/0.14.1/iceberg-spark-runtime-3.2_2.12-0.14.1.jar
Perform SQL operations
use testdb;
CREATE TABLE testdb.iceberg_tbl (id bigint, data string) USING iceberg;
INSERT INTO testdb.iceberg_tbl VALUES (1, 'a'), (2, 'b'), (3, 'c');
select * from testdb.iceberg_tbl;
+-----+-------+
| id | data |
+-----+-------+
| 1 | a |
| 2 | b |
| 3 | c |
+-----+-------+
SELECT * FROM testdb.iceberg_tbl.history;
22/12/07 17:16:37 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test_user] does not have [select] privilege on [testdb.iceberg_tbl/history/made_current_at]
at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:128)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:94)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:93)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:93)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:246)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:215)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
at org.apache.spark.sql.Dataset.toLocalIterator(Dataset.scala:3000)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$2.iterator(ExecuteStatement.scala:107)
at org.apache.kyuubi.operation.IterableFetchIterator.<init>(FetchIterator.scala:78)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:106)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:98)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:90)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$3.run(ExecuteStatement.scala:149)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For the Iceberg table, it is normal to query some metadata information, such as:
# history
0: jdbc:hive2://xx.xx.xx.xx:10011/default> SELECT * FROM shdw.iceberg_tbl.history;
+--------------------------+----------------------+------------+----------------------+
| made_current_at | snapshot_id | parent_id | is_current_ancestor |
+--------------------------+----------------------+------------+----------------------+
| 2022-05-09 10:58:35.835 | 6955843267870447517 | NULL | true |
+--------------------------+----------------------+------------+----------------------+
# snapshots
0: jdbc:hive2://xx.xx.xx.xx:10011/default> SELECT * FROM shdw.iceberg_tbl.snapshots;
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+
| committed_at | snapshot_id | parent_id | operation | manifest_list | summary |
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+
| 2022-05-09 10:58:35.835 | 6955843267870447517 | NULL | append | hdfs://cluster1/tgwarehouse/shdw.db/iceberg_tbl/metadata/snap-6955843267870447517-1-e8206624-fbc3-4cf5-b2cb-2db672393253.avro | {"added-data-files":"3","added-files-size":"1929","added-records":"3","changed-partition-count":"1","spark.app.id":"spark-application-1652065040852","total-data-files":"3","total-delete-files":"0","total-equality-deletes":"0","total-files-size":"1929","total-position-deletes":"0","total-records":"3"} |
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+
# history join snapshot
0: jdbc:hive2://xx.xx.xx.xx:10011/default> select
h.made_current_at,
s.operation,
h.snapshot_id,
h.is_current_ancestor,
s.summary['spark.app.id']
from shdw.iceberg_tbl.history h
join shdw.iceberg_tbl.snapshots s
on h.snapshot_id = s.snapshot_id
order by made_current_at
+--------------------------+------------+----------------------+----------------------+----------------------------------+
| made_current_at | operation | snapshot_id | is_current_ancestor | summary[spark.app.id] |
+--------------------------+------------+----------------------+----------------------+----------------------------------+
| 2022-05-09 10:58:35.835 | append | 6955843267870447517 | true | spark-application-1652065040852 |
+--------------------------+------------+----------------------+----------------------+----------------------------------+
Affects Version(s)
1.7.0(master branch)
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
22/12/07 16:53:57 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test_user] does not have [select] privilege on [testdb.foo/history/made_current_at]
at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:128)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:94)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:93)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:93)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:246)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:215)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
at org.apache.spark.sql.Dataset.toLocalIterator(Dataset.scala:3000)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$2.iterator(ExecuteStatement.scala:107)
at org.apache.kyuubi.operation.IterableFetchIterator.<init>(FetchIterator.scala:78)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:106)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:98)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:90)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$3.run(ExecuteStatement.scala:149)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Kyuubi Server Configurations
spark.sql.extensions org.apache.kyuubi.sql.KyuubiSparkSQLExtension,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type=hive
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
- [ ] Yes. I can submit a PR independently to fix.
- [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- [ ] No. I cannot submit a PR at this time.
cc @bowenliang123 @yaooqinn

I don't have a clue how to exclude the metadata tables like history/snapshots
from the table identifier.
As shown above, the table identifier from select * from iceberg_ns.owner_variable.history
is Some(iceberg_ns.owner_variable.history)
.
Whether possible way to check the table is a in Iceberg catalog and then skip the metadata tables.?
The metadata tables are enumerable, maybe we can hardcode convert the metadata tables' permission check to the data table?
The metadata tables are enumerable, maybe we can hardcode convert the metadata tables' permission check to the data table?
Yes, but first how to check the real table is an iceberg one?
@pan3793 @bowenliang123 Thanks for your support. Different data lake technologies may have different metadata tables. It is possible to judge whether it is a Iceberg or Hudi table from the structure of the created table:
use testdb;
show create table iceberg_tbl;
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE TABLE spark_catalog.testdb.iceberg_tbl (
`id` BIGINT,
`data` STRING)
USING iceberg
LOCATION 'hdfs://cluster1/tgwarehouse/testdb.db/iceberg_tbl'
TBLPROPERTIES(
'current-snapshot-id' = '4900628243476923676',
'format' = 'iceberg/parquet',
'format-version' = '1')
|
+----------------------------------------------------+
why not just grant select privilege to the user who access testdb.iceberg_tbl.history
?
Is this case equivalent to the one that you visit a hive table while you don't have permission to access the HMS table or record, which stores its metadata?
In other words, if we have ALTER privileges to the raw table, we perform ALTER operation on it, and the metadata changes accordingly. This does not mean we need the ALTER privilege to the metadata directly, which results in an ability to falsify critical information.
why not just grant select privilege to the user who access
testdb.iceberg_tbl.history
?
@yaooqinn The Iceberg metadata tables, such as history or snapshots, are not stored in Hive metastore, so they cannot be authorized by ranger.
why not just grant select privilege to the user who access
testdb.iceberg_tbl.history
?
This could be a workaround.
But these tables are more like meta tables
rather than metadata tables
. For querying situations, these derived tables of source tables could be treated as part of table itself, just like the columns.

With further investigation, I think we could tell it's an HistoryTable from an Iceberg table for resolving this.
SparkTable
and HistoryTable
are classes from Iceberg Spark plugin.
For querying situations, these derived tables of source tables could be treated as part of table itself, just like the columns.
yes, this happens when you query the raw table, just like the role that metadata plays when you query a hive one, or indexes, snapshots, etc., which other databases may have.
Personally, for the Iceberg and Hudi storage formats, the permissions should be simplified when accessing the metadata on the table, that is, the permissions to judge the table metadata depend on the permissions of the table. If the table has access permissions, the metadata should have access permissions. In addition, Ranger does not support the metadata of the data lake storage technology.
what's the behavior of Trino/Snowflake(or other popular products)?
Personally, for the Iceberg and Hudi storage formats, the permissions should be simplified when accessing the metadata on the table, that is, the permissions to judge the table metadata depend on the permissions of the table. If the table has access permissions, the metadata should have access permissions. In addition, Ranger does not support the metadata of the data lake storage technology.
agree and we are facing this issue too. maybe we can setup a configuration to decide whether to convert the metadata tables' permission check to the data table or not. saying introducing this as a feature instead of fixing a bug. cc @yaooqinn @pan3793 @bowenliang123