dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[fix#12000]Cannot remove the WorkerGroup from the master service

Open DarkAssassinator opened this issue 3 years ago • 4 comments

Purpose of the pull request

fix #12000

Brief change log

Add registryWorkerGroupNodes and dbWorkerGroupNodes to save worker group from registry center and db. registry remove event will remove the records from registryWorkerGroupNodes, update/insert will keep same. dbWorkerGroupNodes will clear and then addAll each 10s.

Verify this pull request

has test on TR.

DarkAssassinator avatar Sep 19 '22 12:09 DarkAssassinator

could help to ru-run this CodeQL check?

DarkAssassinator avatar Sep 20 '22 14:09 DarkAssassinator

could help to ru-run this CodeQL check?

The Github actions can be triggered by close/reopen this PR or git push.

SbloodyS avatar Sep 21 '22 01:09 SbloodyS

Codecov Report

Merging #12050 (630625b) into dev (8cddb10) will decrease coverage by 0.01%. The diff coverage is 0.00%.

@@             Coverage Diff              @@
##                dev   #12050      +/-   ##
============================================
- Coverage     38.65%   38.64%   -0.02%     
+ Complexity     4005     4004       -1     
============================================
  Files          1002     1001       -1     
  Lines         37213    37431     +218     
  Branches       4249     4257       +8     
============================================
+ Hits          14386    14465      +79     
- Misses        21195    21321     +126     
- Partials       1632     1645      +13     
Impacted Files Coverage Δ
...uler/server/master/registry/ServerNodeManager.java 0.95% <0.00%> (-0.24%) :arrow_down:
...scheduler/plugin/task/mlflow/MlflowParameters.java 18.18% <0.00%> (-56.05%) :arrow_down:
...heduler/plugin/task/jupyter/JupyterParameters.java 0.00% <0.00%> (-33.34%) :arrow_down:
...hinscheduler/plugin/alert/script/ScriptSender.java 53.33% <0.00%> (-7.28%) :arrow_down:
...er/master/dispatch/host/assign/RandomSelector.java 77.77% <0.00%> (-5.56%) :arrow_down:
...org/apache/dolphinscheduler/remote/utils/Host.java 42.55% <0.00%> (-2.13%) :arrow_down:
.../dolphinscheduler/plugin/task/spark/SparkTask.java 72.56% <0.00%> (-2.00%) :arrow_down:
...e/dolphinscheduler/remote/NettyRemotingClient.java 51.42% <0.00%> (-1.43%) :arrow_down:
...phinscheduler/plugin/task/jupyter/JupyterTask.java 62.85% <0.00%> (-0.61%) :arrow_down:
...cheduler/api/service/impl/ExecutorServiceImpl.java 43.33% <0.00%> (-0.58%) :arrow_down:
... and 45 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 22 '22 15:09 codecov-commenter

When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?

ruanwenjun avatar Sep 23 '22 01:09 ruanwenjun

When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?

@ruanwenjun but if we just clear the workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT

DarkAssassinator avatar Sep 23 '22 02:09 DarkAssassinator

When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?

@ruanwenjun but if we just clear the workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT

Yes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.

ruanwenjun avatar Sep 23 '22 09:09 ruanwenjun

When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?

@ruanwenjun but if we just clear the workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT

Yes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.

WorkerGroup in the current memory is composed of two parts of data, one of which is synchronized from the database every 10 seconds(just add but never delete), and the other is obtained from the registry center( also just add and never delete). But we cannot make sure this WorkerGroup from DB or Registry center, so cannot make sure which record has been deleted from db or registry center, unless only Default keep on registry center. Then my PR just replace all worker group from DB when sync, and remove this worker group when this record was delete from registry center, then combing these two part and refresh workerGroupNodes.

DarkAssassinator avatar Sep 23 '22 12:09 DarkAssassinator

When we delete a workerGroup, the TaskInstance should dispatch failed, I think this is our expected. Can we only clear the workerGroupNodes map?

@ruanwenjun but if we just clear the workerGroupNodes, it will affect other normal taskInstance. and i think it will better that keep the memory same as the db and registry center. WDYT

Yes, we are talking about how to make the data in memory is the same with db and registry, currently remove the worker which is deleted, just replace the worker group exist. I think we can replace the workerGroup data in memory when sync.

WorkerGroup in the current memory is composed of two parts of data, one of which is synchronized from the database every 10 seconds(just add but never delete), and the other is obtained from the registry center( also just add and never delete). But we cannot make sure this WorkerGroup from DB or Registry center, so cannot make sure which record has been deleted from db or registry center, unless only Default keep on registry center. Then my PR just replace all worker group from DB when sync, and remove this worker group when this record was delete from registry center, then combing these two part and refresh workerGroupNodes.

We don't need to care about the workgroup in registry center, we only need to make sure the workgroup in memory is same with db, the worker group in registry center will not display in ui, and it is not correcttly, in same case will cause new problems, e.g. we only config the workerGroupA:10.10.10.10 in ui, but we register 11.11.11.11 as workerGroupA in registry, so the task belongs to workerGroupA can be dispatched 11.11.11.11, this is not our expected.

ruanwenjun avatar Sep 26 '22 01:09 ruanwenjun