backend.ai icon indicating copy to clipboard operation
backend.ai copied to clipboard

feat: Add OOM status info

Open Yaminyam opened this issue 3 years ago • 1 comments

https://github.com/lablup/backend.ai-agent/pull/304 I took the work of the pr and proceeded with it.

#265 The contents of the work before the monorepo integration have been transferred to the monorepo. The logic works as it is, and only the parts where errors occur have been resolved.

Yaminyam avatar Aug 05 '22 16:08 Yaminyam

Hmmm... Pausing when OOM occurs seems not to be a valid strategy to let users resolve the problem by themselves, because the OOM state won't be changed but just frozen. (The review & discussion for this point with the original author has been missed due to other urgent issues.)

Could we split the introduction of PAUSE/UNPAUSE lifecycle event addition as a separate PR? In this PR, let's just make the agent to recognize OOM events and report them to the manager, and make the manager to store those as status_info and status_data.

achimnol avatar Aug 16 '22 06:08 achimnol

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Mar 26 '23 03:03 CLAassistant

Replaced with #1373.

achimnol avatar Jul 05 '23 07:07 achimnol