spark [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values

What changes were proposed in this pull request?

When writing dynamic partitions, some dynamic partitions in InsertIntoHadoopFsRelationCommand can be compensated to specific partition values.

Why are the changes needed?

When executing the following SQL ： INSERT OVERWRITE TABLE A PARTITION(event_day, event_type) SELECT id, event_day, event_type from B where event_day = '20240712'

before this pr staticPartitions would be empty in InsertIntoHadoopFsRelationCommand

after this pr staticPartitions would be specific partition values

The benefit is that when executing

in InsertIntoHadoopFsRelationCommand, it can greatly reduce the pressure on hive and improve the efficiency of task execution.

Does this PR introduce any user-facing change?

No

How was this patch tested?

manually test

Was this patch authored or co-authored using generative AI tooling?

No

Jul 19 '24 08:07 fusheng9399

I haven't reviewed the code changes in the pr yet, but:

The PR title should reflect the work done in the current PR as much as possible, the current title looks more like a Jira title
Please ensure the completeness of the PR description, and the following parts are also required to be filled in:

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

In the PR description, it mentions in InsertIntoHadoopFsRelationCommand, it can greatly reduce the pressure on hive and improve the efficiency of task execution. Is this quantifiable?

Jul 22 '24 12:07 LuciferYang

@fusheng-rd Could you add some new test cases for this?

Jul 23 '24 12:07 LuciferYang

I haven't reviewed the code changes in the pr yet, but:

The PR title should reflect the work done in the current PR as much as possible, the current title looks more like a Jira title

Please ensure the completeness of the PR description, and the following parts are also required to be filled in:
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?
In the PR description, it mentions in InsertIntoHadoopFsRelationCommand, it can greatly reduce the pressure on hive and improve the efficiency of task execution. Is this quantifiable?

When a table named A with millions of partitions executes the following sql：

INSERT OVERWRITE TABLE A PARTITION(event_day, event_type) SELECT id, event_day, event_type from B where event_day = '20240712'

It will appear in hive that all partitions of table A are fetched at once, which is very likely to cause slow queries in hive metastore and even drag down the overall performance of Hive's metadata queries. At the same time, the job execution is very slow.

After this PR, only the specified partition and its sub-partitions are fetched, which takes milliseconds to seconds.

Jul 23 '24 14:07 fusheng9399

@fusheng-rd hmm... It seems that the way you merge the code is not quite right.

Aug 07 '24 03:08 LuciferYang

@fusheng-rd hmm... It seems that the way you merge the code is not quite right. Yes, I am fixing that

Aug 07 '24 03:08 fusheng9399

@fusheng-rd Could you add some new test cases for this?

ok，done~

Aug 07 '24 08:08 fusheng9399

The idea LGTM. I think it's cleaner if the implementation is an optimizer rule that rewrites InsertIntoHadoopFsRelation command. We can look at the constraints of the input query, if we can infer an equality predicate against a partition column, we can fill the static partitions of the InsertIntoHadoopFsRelation command.

Aug 14 '24 17:08 cloud-fan

The idea LGTM. I think it's cleaner if the implementation is an optimizer rule that rewrites InsertIntoHadoopFsRelation command. We can look at the constraints of the input query, if we can infer an equality predicate against a partition column, we can fill the static partitions of the InsertIntoHadoopFsRelation command.

ok，I will implement it in the optimization rule

Aug 30 '24 09:08 fusheng9399

Please help review it when you have free time, thanks! @ulysses-you cc @cloud-fan

Oct 11 '24 02:10 fusheng9399