dolphinscheduler
dolphinscheduler copied to clipboard
[fix#12000]Cannot remove the WorkerGroup from the master service
Purpose of the pull request
fix #12000
Brief change log
Add registryWorkerGroupNodes and dbWorkerGroupNodes to save worker group from registry center and db.
registry remove event will remove the records from registryWorkerGroupNodes, update/insert will keep same.
dbWorkerGroupNodes will clear and then addAll each 10s.
Verify this pull request
has test on TR.
could help to ru-run this CodeQL check?
could help to ru-run this CodeQL check?
The Github actions can be triggered by close/reopen this PR or git push.
Codecov Report
Merging #12050 (630625b) into dev (8cddb10) will decrease coverage by
0.01%. The diff coverage is0.00%.
@@ Coverage Diff @@
## dev #12050 +/- ##
============================================
- Coverage 38.65% 38.64% -0.02%
+ Complexity 4005 4004 -1
============================================
Files 1002 1001 -1
Lines 37213 37431 +218
Branches 4249 4257 +8
============================================
+ Hits 14386 14465 +79
- Misses 21195 21321 +126
- Partials 1632 1645 +13
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...uler/server/master/registry/ServerNodeManager.java | 0.95% <0.00%> (-0.24%) |
:arrow_down: |
| ...scheduler/plugin/task/mlflow/MlflowParameters.java | 18.18% <0.00%> (-56.05%) |
:arrow_down: |
| ...heduler/plugin/task/jupyter/JupyterParameters.java | 0.00% <0.00%> (-33.34%) |
:arrow_down: |
| ...hinscheduler/plugin/alert/script/ScriptSender.java | 53.33% <0.00%> (-7.28%) |
:arrow_down: |
| ...er/master/dispatch/host/assign/RandomSelector.java | 77.77% <0.00%> (-5.56%) |
:arrow_down: |
| ...org/apache/dolphinscheduler/remote/utils/Host.java | 42.55% <0.00%> (-2.13%) |
:arrow_down: |
| .../dolphinscheduler/plugin/task/spark/SparkTask.java | 72.56% <0.00%> (-2.00%) |
:arrow_down: |
| ...e/dolphinscheduler/remote/NettyRemotingClient.java | 51.42% <0.00%> (-1.43%) |
:arrow_down: |
| ...phinscheduler/plugin/task/jupyter/JupyterTask.java | 62.85% <0.00%> (-0.61%) |
:arrow_down: |
| ...cheduler/api/service/impl/ExecutorServiceImpl.java | 43.33% <0.00%> (-0.58%) |
:arrow_down: |
| ... and 45 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?
When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the
workerGroupNodesmap?
@ruanwenjun but if we just clear the workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT
When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the
workerGroupNodesmap?@ruanwenjun but if we just clear the
workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT
Yes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.
When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the
workerGroupNodesmap?@ruanwenjun but if we just clear the
workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYTYes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.
WorkerGroup in the current memory is composed of two parts of data, one of which is synchronized from the database every 10 seconds(just add but never delete), and the other is obtained from the registry center( also just add and never delete).
But we cannot make sure this WorkerGroup from DB or Registry center, so cannot make sure which record has been deleted from db or registry center, unless only Default keep on registry center. Then my PR just replace all worker group from DB when sync, and remove this worker group when this record was delete from registry center, then combing these two part and refresh workerGroupNodes.
When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the
workerGroupNodesmap?@ruanwenjun but if we just clear the
workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYTYes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.
WorkerGroupin the current memory is composed of two parts of data, one of which is synchronized from the database every 10 seconds(just add but never delete), and the other is obtained from the registry center( also just add and never delete). But we cannot make sure this WorkerGroup from DB or Registry center, so cannot make sure which record has been deleted from db or registry center, unless onlyDefaultkeep on registry center. Then my PR just replace all worker group from DB when sync, and remove this worker group when this record was delete from registry center, then combing these two part and refreshworkerGroupNodes.
We don't need to care about the workgroup in registry center, we only need to make sure the workgroup in memory is same with db, the worker group in registry center will not display in ui, and it is not correcttly, in same case will cause new problems, e.g. we only config the workerGroupA:10.10.10.10 in ui, but we register 11.11.11.11 as workerGroupA in registry, so the task belongs to workerGroupA can be dispatched 11.11.11.11, this is not our expected.







