backend.ai
backend.ai copied to clipboard
feat: Automate force-termination of hanging sessions
This PR resolves #603. To automate force-termination of hanging sessions, it will be processed in the order below:
- Query kernels
- where
status
isPREPARING
orTERMINATING
- and time has passed more than given
datetime.timedelta
sincestatus_history[status]
- Force termination and cleanup.
Deploy Preview for backendai-docs-preview failed.
Name | Link |
---|---|
Latest commit | 4fd5b24f067aec2ca2cf8400ba47be2d0061196e |
Latest deploy log | https://app.netlify.com/sites/backendai-docs-preview/deploys/630310aea581160009130006 |
@adrysn I also placed an additional config flag force-terminate-hanging-sessions
in manager.toml
. The default value is true
and an user can set it false
to disable the feature.
https://github.com/lablup/backend.ai/blob/ccea5c63de2a9696fb378b13dda003075a800dfe/configs/manager/halfstack.toml#L16-L24
https://github.com/lablup/backend.ai/blob/ccea5c63de2a9696fb378b13dda003075a800dfe/src/ai/backend/manager/server.py#L688-L689
Maybe we can add a config flag for setting an interval to sleep?
I'd like to add this feature! Please merge the latest main branch.