hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-29116 : Create a DDL for setting hive default partition name at the table level

Open vikramahuja1001 opened this issue 5 months ago • 12 comments

  1. Expose a DDL to set default partition name at table level
  2. Restrict set hive.exec.default.partition at runtime
  3. use table property to check the default partition along with hive.exec.default.partition at cluster level

For more details refer: https://issues.apache.org/jira/browse/HIVE-29116

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

vikramahuja1001 avatar Aug 07 '25 07:08 vikramahuja1001

@deniskuzZ , @chinnaraolalam could you please take a look at this PR?

vikramahuja1001 avatar Aug 09 '25 13:08 vikramahuja1001

Thanks for reviewing this PR @ngsg @Indhumathi27 I will handle the review comments, push the changes soon.

vikramahuja1001 avatar Aug 13 '25 16:08 vikramahuja1001

@vikramahuja1001 , i think change is required in org.apache.hadoop.hive.ql.exec.DDLPlanUtils#checkIfDefaultPartition() https://github.com/apache/hive/blob/73503291cbdd0f95bc33add0769234fed6683840/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLPlanUtils.java#L307

Aggarwal-Raghav avatar Aug 14 '25 12:08 Aggarwal-Raghav

Remove hardcoding from here as well + update comments as well accordingly org.apache.hadoop.hive.common.FileUtils#escapePathName(java.lang.String, java.lang.String) https://github.com/apache/hive/blob/413069ea0faeb893fe78faa215b08a70ece80595/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L298

https://github.com/apache/hive/blob/90820ac0a241468482073beb269bcaa3928fbc6f/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java#L310

Aggarwal-Raghav avatar Aug 14 '25 12:08 Aggarwal-Raghav

Remove hardcoding from following place:

https://github.com/apache/hive/blob/733d4f09011f27bea17a5301537a9c6174f28874/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L715

https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatFileUtil.java#L71

Aggarwal-Raghav avatar Aug 14 '25 12:08 Aggarwal-Raghav

@Aggarwal-Raghav ,very good point, makes sense. Thanks for pointing this out. There are indeed a lot of places where HIVE_DEFAULT_PARTITION is hard coded. Have put a few commits to fix that.

vikramahuja1001 avatar Aug 18 '25 14:08 vikramahuja1001

@ngsg , @Aggarwal-Raghav , @Indhumathi27 i have reworked all the comments and pushed all the changes. In the build pipeline all the tests are passing but there seems to be javadoc issue with the thrift gen code, no idea what's causing that. Looking at that atm.

Could you guys please re-review it again?

@chinnaraolalam and @deniskuzZ , could you also please take a look at this PR and maybe review it?

vikramahuja1001 avatar Aug 27 '25 12:08 vikramahuja1001

Thanks a lot for the hard work on this PR. I haven't fully reviewed all the changes yet (the diff is quite large), but let me leave a few comments for now:

  • In some cases, handling default partition name isn't necessary. I would suggest introducing new methods with Preconditions for those cases to avoid passing {tableParam, conf} or a default partition name. For example, when calling Warehouse#escapePathName in Warehouse#makePartNameUtil: e.getKey() doesn't require a default partition name, and e.getValue() is guaranteed not to be empty or null.

  • Extending the above point, I would suggest passing just a default partition name instead of both tableParam and configuration whenever possible. For example, since FileUtils#escapePathNames already takes defaultPath, it would be better to require callers to always pass a proper default partition name. This way, we propagate only the minimal required information and lower the risk of human error when dealing with a potentially mutable map instance.

  • Regarding InsertHiveLocksCommand, I would suggest considering alternatives. Do we really need to ship the entire content of tableParams over LockComponent? Could we instead just send the default partition name, or ensure that the partition name is not null or empty? (We might even assume there is no empty string or null based on the implementation of com.google.common.base.Splitter$MapSplitter#split, though this code is outside our control.)

  • I haven't fully thought this through, but since the changes now affect the Thrift interface, I would like to hear others' opinions. What do you think about extending the definition of Table and Partition in the Thrift interface instead of using tableParams? In this way, we might be able to simplify default partition determining steps.

Also, regarding the error in Javadoc, it seems the issue comes from the javadoc comment in LockComponentBuilder#setTableParams: the parameter name should be tableParams, not tableName.

ngsg avatar Aug 28 '25 04:08 ngsg

@ngsg , thanks for spending time to review this PR and providing your feedback. I will rework as per your above comment and push the changes in a few days. Thanks

vikramahuja1001 avatar Sep 02 '25 15:09 vikramahuja1001

@ngsg Apologies for the delayed response. Couldn't revert back/rework due to limited connectivity. I have reworked the complete PR wrt most of the points mentioned in your previous comment.

Please take a look at the PR once. Let me know if something else also needs to be handled.

vikramahuja1001 avatar Sep 30 '25 04:09 vikramahuja1001

Cc @chinnaraolalam , could you also take a look at this PR

vikramahuja1001 avatar Sep 30 '25 04:09 vikramahuja1001

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Dec 15 '25 00:12 github-actions[bot]