launch launch not killing processes started with shell=True

Bug report

Required Info:

Operating System:
- Ubuntu 20.04
Installation type:
- binaries
Version or commit hash:
- Foxy 0.10.6-1focal.20210901.033555
DDS implementation:
- Fast

Steps to reproduce issue

import launch_testing
import unittest
import os

from launch import LaunchDescription
from launch.actions import DeclareLaunchArgument, ExecuteProcess
from launch.substitutions import PathJoinSubstitution, LaunchConfiguration
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription
from launch.substitutions import PathJoinSubstitution
from launch_ros.substitutions import FindPackageShare
from launch.launch_description_sources import PythonLaunchDescriptionSource



def generate_test_description():
    # This is necessary to get unbuffered output from the process under test
    proc_env = os.environ.copy()
    proc_env['PYTHONUNBUFFERED'] = '1'

    dut_process = ExecuteProcess(cmd=['/usr/bin/sleep', '5']) 
    pkg_dir = FindPackageShare('gazebo_ros')
    full_path = PathJoinSubstitution([pkg_dir] + ['launch', 'gazebo.launch.py'])

    gazebo_launch = IncludeLaunchDescription(
        PythonLaunchDescriptionSource(
            full_path))

    return LaunchDescription([
        gazebo_launch,
        dut_process,

        launch_testing.actions.ReadyToTest(),
    ]), {'dut_process': dut_process}


class TestWait(unittest.TestCase):

    def test_wait_for_end(self, proc_output, proc_info, dut_process):
        # Wait until process ends
        proc_info.assertWaitForShutdown(process=dut_process, timeout=10)


@ launch_testing.post_shutdown_test()
class TestSuccessfulExit(unittest.TestCase):

    def test_exit_code(self, proc_info, dut_process):
        # Check that dut_process finishes with code 0
        launch_testing.asserts.assertExitCodes(proc_info, process=dut_process)

Execute it with launch_test or as part of add_launch_test in CMake.

Expected behavior

When it ends, gazebo should be killed.

Actual behavior

Gazebo is not killed.

Additional information

Happens with other nodes besides gazebo. But gazebo is particularly troublesome because I cannot have 2 tests running sequentially using gazebo, because the first test didn't kill the server, and only one server may be running in a machine.

Might be related to https://github.com/ros2/launch/issues/495 because if instead of launch_testing I use ros2 launch gazebo_ros gazebo.launch.py, I can kill it properly with Ctrl+C, but not with kill -SIGINT. Even if I use ros2 launch gazebo_ros gazebo.launch.py --noninteractive

Oct 27 '21 09:10 v-lopez

Hmm, @v-lopez this looks like Gazebo not forwarding signals properly. Would you mind trying to force launch_testing's LaunchService instance to be non-interactive? If that fixes the issue, we can submit it as a patch.

Oct 27 '21 15:10 hidmic

I tried that (and verified that it was not being overwritten to False), and the behavior was the same.

Oct 29 '21 07:10 v-lopez

I have the same problem with a standalone unity executable not being killed, but I can kill all the processes correctly if I Ctrl+C before the test is complete

Nov 29 '21 15:11 benjinne

My current workaround is to add a post_shutdown_test to kill the remaining processes

@launch_testing.post_shutdown_test()
class TestShutdown(unittest.TestCase):
    def test_kill_sim(self):
        unity_pid = check_output(["pidof", "-z", "unity_executable.x86_64"])
        print('got unity pid')
        unity_pid = int(unity_pid)
        print('unity pid is: ' + str(unity_pid))
        os.kill(unity_pid, signal.SIGKILL)
        print('Killed unity')

I think the reason this is happening is because the executable is ran from a series of launch files, then the last launch file calls a bash script that uses ros2 run unity_executable.x86_64. I can provide more details if needed

Jan 11 '22 15:01 benjinne

I'm experiencing this issue with gzserver, gzclient, and rqt. Here's a minimal reproducible example with Gazebo:

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest


@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            IncludeLaunchDescription(
                PythonLaunchDescriptionSource([FindPackageShare('gazebo_ros'), '/launch/gazebo.launch.py'])
            ),
            ReadyToTest(),
        ])


class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

Apr 26 '22 23:04 jacobperron

We were facing a similar issue in ignition fortress, where launch was not able to kill the gazebo process. As @hidmic mentioned, it was related to Gazebo not forwarding signals. That was fixed using this PR, not sure if that is related : https://github.com/gazebosim/gz-launch/pull/151

Apr 28 '22 17:04 adityapande-1995

I'm experiencing this issue with gzserver, gzclient, and rqt. Here's a minimal reproducible example with Gazebo:

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest


@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            IncludeLaunchDescription(
                PythonLaunchDescriptionSource([FindPackageShare('gazebo_ros'), '/launch/gazebo.launch.py'])
            ),
            ReadyToTest(),
        ])


class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

Fix here : https://github.com/ros-simulation/gazebo_ros_pkgs/pull/1376

Apr 28 '22 20:04 adityapande-1995

Another minimal example :

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription, ExecuteProcess
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest


@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            ExecuteProcess(
                cmd=['python3','test1.py'],
                shell=True,
                output='both'
            ),
            ExecuteProcess(
                cmd=['python3','test2.py'],
                shell=False,
                output='both'
            ),
            ReadyToTest(),
        ])


class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

Where both test files are just :

import time

while 1 :
    time.sleep(0.5)
    print("Hello 1")

test1.py does not terminate, yet test2.py does.

May 03 '22 21:05 adityapande-1995

This is not a problem IMO. When you use shell=True, you're running a shell process, which in this case will run python3 test1.py as a subprocess. Launch is correctly sending a signal to the shell process, but that one is not forwarding it to its subprocess.

You probably can fix that be trapping SIGINT/SIGTERM and killing subprocesses before exiting the shell (e.g. with kill -$SIGNAL $(jobs -p)).

I don't think we can do anything by default in launch that's reasonable, people should be careful when using shell=True.

Jul 15 '22 15:07 ivanpauno

I have updated the issue title, as I don't think the original one was correct.

Jul 15 '22 15:07 ivanpauno

This is easy to reproduce from a pure launch file:

ros2 launch ./test.launch.py  &
kill -SIGINT $!

# ./test.launch.py
import launch
from launch.actions import ExecuteProcess

PYTHON_SCRIPT="""\
import time

while 1:
    time.sleep(0.5)
"""

def generate_launch_description():
    return launch.LaunchDescription(
        [
            ExecuteProcess(
                cmd=['python3', '-c', f'"{PYTHON_SCRIPT}"'],
                shell=True,
                output='both'
            ),
            ExecuteProcess(
                cmd=['python3', '-c', f'{PYTHON_SCRIPT}'],
                shell=False,
                output='both'
            ),
        ])

after waiting the launch process to finish (it will take some seconds and log some stuff), it's easy to check that the second process was killed and the first wasn't (e.g. using ps -elf).

Note:

The issue isn't noticed if instead of using kill -SIGINT $! the user presses ctrl+c. That's because pressing ctrl+c will send a SIGINT signal to all the processes of the group (all the processes launch from the launch file).
Currently if SIGTERM is sent, instead of SIGINT, no process is being killed. That seems like a separate issue that should be handled as well.
We could maybe try to send a signal to the whole group (in the platforms that support this), or to find all the children of the launch process recursively (e.g. with psutil) to solve the problem.

I will give it a try to the last option, but I'm not sure how multiplatform that's going to be.

cc @jacobperron

Jul 21 '22 20:07 ivanpauno

I have opened https://github.com/ros2/launch/pull/632, which will fix the issue.

Jul 25 '22 19:07 ivanpauno

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/sequentially-starting-launch-files-using-ros-2-python-when-a-specific-log-message-is-detected/30633/3

Mar 31 '23 17:03 ros-discourse

launch launch copied to clipboard

launch not killing processes started with shell=True

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

launch
launch copied to clipboard