aibrix
aibrix copied to clipboard
redis is not that stable and quit from SIGTERM
🐛 Describe the bug
ubuntu@158-101-17-114:~$ kubectl logs -f aibrix-redis-master-84769768cb-j5rfb -p -n aibrix-system
1:C 16 Feb 2025 18:46:20.187 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 16 Feb 2025 18:46:20.187 * Redis version=7.4.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 16 Feb 2025 18:46:20.187 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 16 Feb 2025 18:46:20.187 * monotonic clock: POSIX clock_gettime
1:M 16 Feb 2025 18:46:20.189 * Running mode=standalone, port=6379.
1:M 16 Feb 2025 18:46:20.189 * Server initialized
1:M 16 Feb 2025 18:46:20.189 * Ready to accept connections tcp
1:signal-handler (1739731666) Received SIGTERM scheduling shutdown...
1:M 16 Feb 2025 18:47:46.562 * User requested shutdown...
1:M 16 Feb 2025 18:47:46.562 * Saving the final RDB snapshot before exiting.
1:M 16 Feb 2025 18:47:46.564 * DB saved on disk
1:M 16 Feb 2025 18:47:46.564 # Redis is now ready to exit, bye bye...
Steps to Reproduce
deploy on lambda cloud.
Expected behavior
should be very stable. I've never seen such issue
Environment
nightly
same here. it only happens on lambda instance + nvkind
the problem still exist.
Actually most of the containers crashed.
metadata-service
gpu-optimizer
gateway-plugin
redis-master
controller-manager
three categories
- solid softwares like redis/controller/gateway-plugin, exitCode is 0. they all have error handling
- our own written compinents, like gpu-optimizer, metadata service shows other error codes.
- kuberay pod is not affected which is weird.
We are pretty sure it's due to kind setup
looks like worker node has enough resoures
I can not easily figure this out. Kind of hard to debug the kind problem here.