starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] support schedule node by label

Open MatthewH00 opened this issue 1 year ago • 7 comments

Why I'm doing: In shared nothing mode , the previous PR #38833 achieve the data replica placement strategy of backends with different locations. When execute query, want to limit to use the resource of some backends/computeNodes with specific labels , then can get resource hard isolation at machine level.

What I'm doing: 1.set property for user add a new user property(labels.location) and support set property for user like: SET PROPERTY for 'test' 'labels.location' = 'rack:group_a,rack:group_b'; (to be noticed: support multi level location property like a:b or a:b,c:d)

2.set label for node consider deploy backend only or deploy both backend and computeNode , set label for backend and computeNode. 2.1 for backend refer to the previous PR #38833

ALTER SYSTEM MODIFY BACKEND "IP1:9050" SET ("labels.location" = "rack:group_a");
ALTER SYSTEM MODIFY BACKEND "IP2:9050" SET ("labels.location" = "rack:group_b");
ALTER SYSTEM MODIFY BACKEND "IP3:9050" SET ("labels.location" = "rack:group_c");

2.2 for computeNode modify syntax to support add location property to computeNode like:

ALTER SYSTEM ADD COMPUTE NODE "IP4:9050" PROPERTIES ("labels.location" = "rack:group_a");
ALTER SYSTEM ADD COMPUTE NODE "IP5:9050" PROPERTIES ("labels.location" = "rack:group_b");
ALTER SYSTEM ADD COMPUTE NODE "IP6:9050" PROPERTIES ("labels.location" = "rack:group_c");
(to be noticed: only support single level location property like a:b now)

3.node selection when query 3.1 when deploy backend only the backend IP1+IP2 would be seleted 3.2 when deploy both backend and computeNode the backend IP1+IP2 and computeNode IP4+IP5 would be seleted

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [x] Yes, this PR will result in a change in behavior.
  • [ ] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [x] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 3.3
    • [ ] 3.2
    • [ ] 3.1
    • [ ] 3.0
    • [ ] 2.5

MatthewH00 avatar Jul 29 '24 11:07 MatthewH00

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

sonarqubecloud[bot] avatar Aug 09 '24 04:08 sonarqubecloud[bot]

[FE Incremental Coverage Report]

:x: fail : 95 / 125 (76.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/ast/ShowUserPropertyStmt.java 0 1 00.00% [82]
:large_blue_circle: com/starrocks/alter/SystemHandler.java 0 5 00.00% [297, 298, 299, 301, 302]
:large_blue_circle: com/starrocks/system/SystemInfoService.java 20 38 52.63% [955, 956, 958, 959, 960, 961, 963, 964, 965, 985, 986, 988, 989, 990, 991, 993, 994, 995]
:large_blue_circle: com/starrocks/authentication/AuthenticationMgr.java 3 4 75.00% [196]
:large_blue_circle: com/starrocks/authentication/UserProperty.java 16 19 84.21% [156, 157, 336]
:large_blue_circle: com/starrocks/qe/scheduler/DefaultWorkerProvider.java 13 14 92.86% [403]
:large_blue_circle: com/starrocks/sql/analyzer/AlterSystemStmtAnalyzer.java 18 19 94.74% [99]
:large_blue_circle: com/starrocks/sql/ast/AddComputeNodeClause.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/ast/ComputeNodeClause.java 6 6 100.00% []
:large_blue_circle: com/starrocks/system/ComputeNode.java 11 11 100.00% []
:large_blue_circle: com/starrocks/common/proc/ComputeNodeProcDir.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/ConnectContext.java 5 5 100.00% []

github-actions[bot] avatar Aug 09 '24 05:08 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Aug 09 '24 05:08 github-actions[bot]

@kevincai Hi Could you please review the pr when you have free time ? related UT and SQL-Tester have passed.

MatthewH00 avatar Aug 09 '24 05:08 MatthewH00

@alvin-celerdata Hi Could you please help review the PR when have free time? The PR achieve the calculate resource limitation of backends/computeNodes with specific labels at machine node level when run query, and associate with user property. I think it would be a better hard resource isolation in multi-tenancy environment in shared-nothing mode.

MatthewH00 avatar Jun 18 '25 05:06 MatthewH00

@alvin-celerdata Hi Could you please help review the PR when have free time? The PR achieve the calculate resource limitation of backends/computeNodes with specific labels at machine node level when run query, and associate with user property. I think it would be a better hard resource isolation in multi-tenancy environment in shared-nothing mode.

Before I deep dive into the code changes, I want to know what you are doing in this change. Could you create a GitHub feature request Issue ticket to describe it?

alvin-celerdata avatar Jun 18 '25 06:06 alvin-celerdata

@alvin-celerdata Hi Could you please help review the PR when have free time? The PR achieve the calculate resource limitation of backends/computeNodes with specific labels at machine node level when run query, and associate with user property. I think it would be a better hard resource isolation in multi-tenancy environment in shared-nothing mode.

Before I deep dive into the code changes, I want to know what you are doing in this change. Could you create a GitHub feature request Issue ticket to describe it?

@alvin-celerdata Please look the PR description, it is clear. To be brief, in one cluster of shard-nothing mode, diffrent user could use exclusive resource(storage/computation) by label of node. Achieving a better hard resource isolation in multi-tenancy environment. like multi-warehourse in shard-data mode.

MatthewH00 avatar Jun 18 '25 06:06 MatthewH00