spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-52417][SQL] Simplify Table properties handling in View Schema Evolution Mode

Open szehon-ho opened this issue 6 months ago • 2 comments

What changes were proposed in this pull request?

When a View is created, ex CREATE VIEW v (a1 INT, a2 STRING) AS select c1, c2, it needs to save both set of columns (alias and query output).

  1. User-specified columns are saved as View Schema (a1 INT, b2 STRING).
  2. View query output is saved as Table property w/index (c1, 0) (c2, 1)

In the new Schema Evolution mode, we never allow user-specified columns, so view schema == view query output schema.  Every time we detect the output view schema changes, we sync the view's schema with view query schema, keeping the invariant.

So we can simplify the update in Schema Evolution mode to not update the Table Properties, and instead rely on the View Schema all the time.

Why are the changes needed?

View Schema Evolution is a useful mode. However, it requires a lot of permissions on the user querying the view, because that user needs to update the View definition. It will simplify auth if we do not have to update the properties too (which could be admin-level properties), and reduce the update to only the schema.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit test

Was this patch authored or co-authored using generative AI tooling?

No

szehon-ho avatar Jun 07 '25 01:06 szehon-ho

@cloud-fan can you take a look?

szehon-ho avatar Jun 15 '25 19:06 szehon-ho

Shall we also skip generating the view query output table properties when creating the view? @szehon-ho

cloud-fan avatar Jun 16 '25 01:06 cloud-fan

@cloud-fan @gengliangwang can you take another look? Thanks

szehon-ho avatar Jun 16 '25 23:06 szehon-ho

Also maybe I should put this behind a flag, for compatibility reason for previous version of Spark

szehon-ho avatar Jun 17 '25 18:06 szehon-ho

@szehon-ho let's also add test to verify the property for view is not changed after schema evolution.

gengliangwang avatar Jun 17 '25 21:06 gengliangwang

I think its a unrelated error?

Documentation Generation:
/__w/_temp/0d08ee56-1341-4270-a8c6-156869cc0e28.sh: 1: python3.9: not found
Error: Process completed with exit code 127.

unless i miss another error

szehon-ho avatar Jun 18 '25 00:06 szehon-ho

Documentation Generation: /__w/_temp/0d08ee56-1341-4270-a8c6-156869cc0e28.sh: 1: python3.9: not found Error: Process completed with exit code 127.

This is not related. Rebasing the master branch of your spark fork will fix it.

gengliangwang avatar Jun 18 '25 03:06 gengliangwang

Thanks, merging to master

gengliangwang avatar Jun 18 '25 03:06 gengliangwang