kyuubi icon indicating copy to clipboard operation
kyuubi copied to clipboard

[AUTHZ] count(*)/count(1) should check sub nodes' privileges

Open LennonChin opened this issue 3 months ago • 4 comments

Spark Optimizer's ColumnPruning will replace count(*)/count(1) Aggregate plan's child to a Project node with empty projection list:

object ColumnPruning extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = removeProjectBeforeFilter(
    plan.transformWithPruning(AlwaysProcess.fn, ruleId) {
    ...
    // Prunes the unused columns from child of Aggregate/Expand/Generate/ScriptTransformation
    case a @ Aggregate(_, _, child) if !child.outputSet.subsetOf(a.references) =>
      a.copy(child = prunedChild(child, a.references))
    ...
}

but AuthZ plugin's PrivilegesBuilder.buildQuery method will ignore to check child node when plan's inputSet is empty, in this scenario, Aggregate node's child plan's privileges are ignored, which cause count(*)/ count(1) will ignored all privileges that should be checked.

Why are the changes needed?

this patch add a holder node ChildOutputHolder when Aggregate node's references and it's child node's outputSet hava no intersection, it will hold the child node's outputSet used for build privilege objects, and ChildOutputHolder node will be eliminated after RuleAuthorization rule work completed.

How was this patch tested?

updated old unit tests and added new unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

LennonChin avatar Sep 12 '25 08:09 LennonChin

related issue: #7173, cc @bowenliang123

LennonChin avatar Sep 12 '25 08:09 LennonChin

Codecov Report

:x: Patch coverage is 0% with 17 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 0.00%. Comparing base (8b56295) to head (952548f). :warning: Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
.../spark/authz/rule/plan/RuleChildOutputMarker.scala 0.00% 6 Missing :warning:
...ugin/spark/authz/rule/plan/ChildOutputHolder.scala 0.00% 4 Missing :warning:
...rk/authz/rule/RuleEliminateChildOutputHolder.scala 0.00% 3 Missing :warning:
.../kyuubi/plugin/spark/authz/PrivilegesBuilder.scala 0.00% 2 Missing :warning:
...ugin/spark/authz/ranger/RangerSparkExtension.scala 0.00% 2 Missing :warning:
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7204   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         695     698    +3     
  Lines       43505   43495   -10     
  Branches     5888    5886    -2     
======================================
+ Misses      43505   43495   -10     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Sep 12 '25 10:09 codecov-commenter

this approach lgtm, I suggest adding some comments.

cc @wForget will this affect lineage?

also cc @zhouyifan279 since you have worked closely on this part

pan3793 avatar Sep 12 '25 15:09 pan3793

cc @wForget will this affect lineage?

It seems not, ChildOutputHolder extends UnaryNode, so it will pass the child's lineage.

wForget avatar Sep 15 '25 05:09 wForget