starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

Add a debug mode configuration to the current k8s deployment stack

Open dengliu opened this issue 2 years ago • 8 comments

Feature request

Is your feature request related to a problem? Please describe. Can we add a debug mode configuration to the current k8s deployment stack? When BE process crashes, it won't fail the healthness/readiness check that triggres deleting the current container and recreating/restarting a new container. In this way, we could log into the container to further troubleshoot, and restart the BE process manually. We found this debug mode is very necessary during today's large PCU crash debugging and would like the debug mode better supported in the future. Describe the solution you'd like One option is that instead of having starrocks_be as the entry point we have starrocks_wrapper as the entry point. starrocks_wrapper is a program that monitors be, restarts it if necessary, and kills itself if it wants pod to restart. Describe alternatives you've considered

Additional context

dengliu avatar Oct 20 '23 20:10 dengliu

@yandongxiao

kevincai avatar Oct 21 '23 02:10 kevincai

A great suggestion. In Debug mode, first, remove the Liveness Probe, and secondly, ensure that when BE crashes, it doesn't cause the entire Pod to restart. For the second feature, we might need to modify the entrypoint.

yandongxiao avatar Oct 21 '23 02:10 yandongxiao

The BE entrypoint still needs to handle the signal properly.

yandongxiao avatar Oct 21 '23 02:10 yandongxiao

Before the operator was launched, we deployed our own BE statefulset with a while loop at the end and didn't set any liveness/readiness probe. When BE process crashed, the pod/container still alive. and we could log into the pod to troubleshoot and restart the BE process. Below is the pseudo code. I think a better way is to use the entrypoint process to trap SIGTERM/SIGKILL and forward to the be process. Here are example that you can follow: https://unix.stackexchange.com/questions/146756/forward-sigterm-to-child-in-bash

./start_be.sh --daemon
while sleep 10; do
...
done

dengliu avatar Oct 21 '23 04:10 dengliu

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

github-actions[bot] avatar Apr 22 '24 11:04 github-actions[bot]

reactivate the issue

kevincai avatar Apr 23 '24 10:04 kevincai

once theCOREDUMP_ENABLED=true is enabled, it automatically enter into debug mode

dengliu avatar Apr 24 '24 01:04 dengliu

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

github-actions[bot] avatar Oct 21 '24 11:10 github-actions[bot]