[Improvement]: Build a rule of relationship between table and optimizer/resource group
Search before asking
- [X] I have searched in the issues and found no similar issues.
What would you like to be improved?
Right now, when we want to declare a table optimized by some optimizer group, we have two clear ways:
-
set default optimizer group of a catalog, and don't declare optimizer group in table properties:
-
declare a property of 'self-optimizing.group' in table properties (in create table or alter table statement):
In practice, using default optimizer group has better experience while not flexible in case that multiple groups are necessary in one catalog. Using table property provides more flexibility but sacrifice user experience and security, imagine that every table(user) needs to know the resources behind AMS and has the authority to allocate resources, this could be a disaster.
It does't seem a big deal of this because in many cases there's only one external/default optimizer group without considerations for security and isolation. But it would be never late to have a better way to provide user experience, isolation and security for self-optimizing
How should we improve?
Better user experience users only declare relationships in one place and use them everywhere. It's a bad idea to define a property in table which means table owner must know the concepts and instances.
It's a good idea of declaring properties in optimizer group and use an extendable rule like regex
Better security Relationships of table and resource should be certain and can not be modified without the permission of the owner of resources. It is clear that declaring properties in optimizer group fulfills this criterion
Better isolation when we declare relationships of table and resources or modify them, the rules must be mutually exclusive
In conclusion, I proposed that declaring regex rules in optimizer group defines relationships of table and resources. For example:
catalog1.db1.* catalog2..
leads to a clear definition that this optimizer groups could be used in these tables and only used by them.
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
Subtasks
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
I believe the intended effect of this feature should be as follows: Priority of group configuration: Table-level configuration > Regex rule configuration > Catalog default configuration.
If a rule is manually configured on the table, it should take precedence. Rule persistence
- Regex rule configurations should not be written to the underlying table's properties.
- the rule stored in catalog properties
Rule change
- Changes to regex rules will result in group changes for all affected tables.
- After a regex rule is deleted, the catalog's default configuration should take effect.
It can take effect through TableRuntimeRefresh.
Rule queries:
- When displaying the Optimize group list, show the rules affecting the tables which is collected from catalog prooperties.
2.Also display these configuration rules in the catalog's properties.
IMO,Based on the issue description, there was an initial intention to configure this rule at the group level. I agree with this, but from an implementation standpoint, this will involve extensive changes in every properties call.
If we configure the regex rules in the catalog's properties, the effect of this property can be consolidated in the BasicUnkeyedTable::properties call within the MixedCatalogUtil::mergeCatalogPropertiesToTable method, making it convenient to implement.
@XBaith @majin1102 @zhoujinsong @nicochen WDYT.
I don't think rules on catalog properties are necessary if we could use optimizer group
Will it be possible for multiple types of optimizers to exist in one OptimizerGroup in the future? For example, the same OptimizerGroup may contain both Flink and Spark optimizers. The OptimizerGroup is similar to a logical resource pool, and different types of optimizers will occupy some resources.
Will it be possible for multiple types of optimizers to exist in one OptimizerGroup in the future? For example, the same OptimizerGroup may contain both Flink and Spark optimizers. The OptimizerGroup is similar to a logical resource pool, and different types of optimizers will occupy some resources.
What scenarios would this hybrid resource model be helpful for? I believe this will introduce considerable complexity.
@majin1102 Thanks for the reply, I'm asking this because of the following scenarios: when using Flink optimizer for merging, the optimizer may stop/or need to chase data, or there may be sudden needs for merging. However, Flink optimizer is not particularly good at automatic scaling (at least on Yarn).
In addition, if we consider resources, that is, OptimizerGroup is just a resource pool, and the optimizer is an application running in OptimzierGroup(similar to OptimizerGroup is a queue of Yarn, the optimizer is an application), will this not add too much complexity, or is there something I missed here? thanks
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'