egg-cluster icon indicating copy to clipboard operation
egg-cluster copied to clipboard

feat: kill worker with SIGTERM first

Open zweifisch opened this issue 5 years ago • 2 comments

Checklist
  • [x] npm test passes
  • [x] commit message follows commit guidelines
Affected core subsystem(s)
Description of change

current workflow for killing a worker:

worker.kill('SIGTERM') -> wait -> worker.process.kill('SIGKILL')

when worker.kill() timed out, worker.process.kill('SIGKILL') is invoked, causing the process to quit immediately, make packages like signal-exit unusable.

to fix this, worker.process.kill('SIGTERM') is added in between:

worker.kill('SIGTERM') -> wait -> worker.process.kill('SIGTERM') -> wait -> worker.process.kill('SIGKILL')

zweifisch avatar Oct 22 '19 12:10 zweifisch

Codecov Report

Merging #95 into master will decrease coverage by 0.18%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
- Coverage   98.81%   98.63%   -0.19%     
==========================================
  Files           8        8              
  Lines         507      511       +4     
==========================================
+ Hits          501      504       +3     
- Misses          6        7       +1
Impacted Files Coverage Δ
lib/utils/terminate.js 100% <100%> (ø) :arrow_up:
lib/utils/messenger.js 98.07% <0%> (-1.93%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d6b6102...3334198. Read the comment docs.

codecov[bot] avatar Oct 22 '19 12:10 codecov[bot]

@zweifisch 这个 PR 是想解决什么问题?

killagu avatar Oct 28 '19 01:10 killagu

@killagu 这个 RP 我理解是为了解决 egg gruaceful stop 的问题,现在的行为是不等 worker 退出 app 就强制 kill 了。我现在也遇到了这个问题,这个 PR 可以 Merge 吗?

Fov6363 avatar Dec 15 '22 07:12 Fov6363

@killagu 这个 RP 我理解是为了解决 egg gruaceful stop 的问题,现在的行为是不等 worker 退出 app 就强制 kill 了。我现在也遇到了这个问题,这个 PR 可以 Merge 吗?

https://github.com/eggjs/egg-cluster/blob/2b662a5d5a86cebf1096648afe473a96ce09b498/lib/utils/terminate.js#L18-L22

这里有一个超时等待的,是不是没有在超时时间内退出?

killagu avatar Dec 16 '22 02:12 killagu

是的,一个常见场景就是某个 API 需要超过5秒以上,比如下载任务,这个时候没有完成就强制 kill 掉了。

需求期望是有一个最长超时时间,超过了再kill 掉 worker,如果 worker 能提前自己 exit 也就立刻结束超时时间的等待。

Fov6363 avatar Dec 29 '22 04:12 Fov6363

是的,一个常见场景就是某个 API 需要超过5秒以上,比如下载任务,这个时候没有完成就强制 kill 掉了。

需求期望是有一个最长超时时间,超过了再kill 掉 worker,如果 worker 能提前自己 exit 也就立刻结束超时时间的等待。

补充一下你说的这个场景的测试用例,应该是一个配置。

fengmk2 avatar Dec 30 '22 04:12 fengmk2

https://github.com/eggjs/egg-cluster/pull/111 应该是这个 bug

fengmk2 avatar Dec 08 '23 12:12 fengmk2