iotdb icon indicating copy to clipboard operation
iotdb copied to clipboard

[Bug] Docker signal handling not working

Open fschulze-dtm opened this issue 1 year ago • 4 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Version

iotdb 1.3.1-standalone

Describe the bug and provide the minimal reproduce step

When stoping a docker container running the apache/iotdb:1.3.1-standalone image the SIGTERM signal handling trap is not executed leading to a non graceful shut down. This is because the entrypoint.sh script uses exec which destroys signal handlers using trap.

Furthermore, the function that should be executed at SIGTERM 'on_stop' defined in entrypoint.sh has the if statement "$start_what" != "all".` Therfore in standalone mode the corresponding graceful shutdown is not executed.

To reproduce run the docker container and then stop it.

What did you expect to see?

The on_stop function defined in entrypoint.sh is executed when the docker container is stopped providing a graceful shutdown with FLUSH.

What did you see instead?

Rapid shut down without proper SIGNAL handling and without execution of the on_stop function.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

fschulze-dtm avatar Sep 12 '24 12:09 fschulze-dtm

Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!

github-actions[bot] avatar Sep 12 '24 12:09 github-actions[bot]

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi

CritasWang avatar Sep 19 '24 10:09 CritasWang

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi

This is the code snippet from apache/iotdb:1.3.2-standalone image. In apache/iotdb:1.3.1-standalone it is

if [[ "$start_what" == "datanode" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping datanode service";
    stop-datanode.sh ;
    echo "##### done ######";
elif [[ "$start_what" != "all" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping confignode and datanode service";
    stop-standalone.sh ;
    echo "##### done ######";
elif [[ "$start_what" == "confignode" ]]; then
    echo "stopping confignode service";
    stop-confignode.sh;
    echo "##### done ######";
fi

Also the main problem of using exec in the entrypoint.sh which kills the trap remains.

fschulze-dtm avatar Sep 30 '24 09:09 fschulze-dtm

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi

This is the code snippet from apache/iotdb:1.3.2-standalone image. In apache/iotdb:1.3.1-standalone it is

if [[ "$start_what" == "datanode" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping datanode service";
    stop-datanode.sh ;
    echo "##### done ######";
elif [[ "$start_what" != "all" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping confignode and datanode service";
    stop-standalone.sh ;
    echo "##### done ######";
elif [[ "$start_what" == "confignode" ]]; then
    echo "stopping confignode service";
    stop-confignode.sh;
    echo "##### done ######";
fi

Also the main problem of using exec in the entrypoint.sh which kills the trap remains.

Actually, an elegant shutdown only requires calling the stop script.

start-cli.sh -e "flush;"

This operation is just a guarantee mechanism, and after calling the stop script, the program will also perform corresponding elegant shutdown processing internally

CritasWang avatar Oct 11 '24 02:10 CritasWang