ros2cli icon indicating copy to clipboard operation
ros2cli copied to clipboard

ros2 topic list takes 10 minutes in a container (when run first time)

Open marekcygan opened this issue 1 year ago • 18 comments

Bug report

Listing topics takes several minutes (10-20) when run the first time from the command line. During this time one core is used 100%.

Required Info:

  • Operating System:
    • Host operating system: Manjaro (also checked on Arch, surprisingly it works on Ubuntu which is probably why the issue is not wide-spread).
    • docker version: Docker version 25.0.3, build 4debf411d1
  • Installation type:
    • Ros docker image (rolling-desktop, but also checked on iron and humble).

Steps to reproduce issue

docker run -it osrf/ros:rolling-desktop

(inside docker)

ros2 topic list

Expected behavior

Topics listed after a second.

Actual behavior

Command takes 10 minutes.

marekcygan avatar Apr 24 '24 16:04 marekcygan

Hi @marekcygan, We did a brainstorming session on our weekly waffle triage meeting about this issue and here is our outcome.

  1. We need more information about the storage driver and about the host system itself.
  2. Could you please try to narrow down the problem by running with --no-daemon? If the issue is gone - likely relates to the issue when the node graph collecting info about other nodes.
  3. This issue might be related to the fix (patch) for the "Meltdown" vulnerability. It was a patches in the stdlib and linux kernel for that and we have seen significant performance degradation on some other platforms.

MichaelOrlov avatar May 02 '24 17:05 MichaelOrlov

@MichaelOrlov thanks for your attention!

  1. Storage driver: overlay 2 (putting full docker info in a separate comment). Host system:
❯ uname -r
6.8.5-1-MANJARO
  1. I got an error when adding --no-daemon to ros2 topic list:
root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
Aborted (core dumped)

marekcygan avatar May 03 '24 05:05 marekcygan

❯ docker info
Client:
 Version:    25.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.13.0
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
WARNING: Plugin "/home/marek/.docker/cli-plugins/docker-compose" is not valid: failed to fetch metadata: exit status 255

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 27
 Server Version: 25.0.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8.m
 runc version: 
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.5-1-MANJARO
 Operating System: Manjaro Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.72GiB
 Name: marek-rtx-3070
 ID: B4RK:DIMQ:CKKZ:HOQD:QE3R:4IO6:V6GJ:LCPT:2ORO:W5I7:WM37:SM35
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

marekcygan avatar May 03 '24 05:05 marekcygan

@MichaelOrlov any ideas what should be the next steps?

marekcygan avatar May 20 '24 07:05 marekcygan

@nuclearsandwich @wjwwood Any thoughts after providing details about running system configuration?

MichaelOrlov avatar May 20 '24 17:05 MichaelOrlov

@marekcygan

so after,

docker run -it osrf/ros:rolling-desktop

this just hangs up forever,

ros2 topic list

but this generates the permission error?

root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
Aborted (core dumped)

that is really weird, and probably not related to ROS2...

a couple of things i would check,

### Did you source the ROS2 environment
root@cc8f329115ab:/# source /opt/ros/rolling/setup.bash 

### Check the file and id ownership and permission
root@cc8f329115ab:/# which ros2
/opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2
-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# id -a
uid=0(root) gid=0(root) groups=0(root)

fujitatomoya avatar May 20 '24 21:05 fujitatomoya

@fujitatomoya @MichaelOrlov

@marekcygan

so after,

docker run -it osrf/ros:rolling-desktop

this just hangs up forever,

Not forever, it takes 10 minutes to finish.

root@dd4078a0cd6b:/# time ros2 topic list
/parameter_events
/rosout

real	11m27.017s
user	9m53.996s
sys	1m32.153s
ros2 topic list

but this generates the permission error?

It used to, but now it does not, now it prints what it should immediately:

root@dd4078a0cd6b:/# ros2 topic list --no-daemon
/parameter_events
/rosout

a couple of things i would check, Did you source the ROS2 environment

Yes, otherwise I would not be able to run ros2 topic.

Check the file and id ownership and permission root@cc8f329115ab:/# which ros2

I get:

/opt/ros/rolling/bin/ros2

root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2

-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2

root@cc8f329115ab:/# id -a

uid=0(root) gid=0(root) groups=0(root)

marekcygan avatar May 26 '24 14:05 marekcygan

Not forever, it takes 10 minutes to finish.

can you stop the container and start it up, and the try following?

### login container and then

### check if ros2 daemon is running, expecting not running
ros2 daemon status

### ros2 command, expecting this takes 10 mins
ros2 topic list

### see if ros2 daemon is now running
ros2 daemon status

### ros2 command, to tell the problem is daemon spawning process or XMLRPC traffic.
ros2 topic list

if 2nd ros2 topic list responds quickly, the problem can be spawning process for ros2 daemon on your platform.

It used to, but now it does not, now it prints what it should immediately:

at least, this is relief. thanks for checking.

fujitatomoya avatar May 26 '24 19:05 fujitatomoya

One more piece of information is that I have updated all the manjaro packages last week.

❯ docker --version
Docker version 26.1.1, build 4cf5afaefa
❯ uname -r
6.8.9-3-MANJARO

marekcygan avatar May 26 '24 20:05 marekcygan

I also run into this issue on Manjaro. After digging a little bit I found that it gets stuck in this loop: https://github.com/ros2/ros2cli/blob/58b61c98378fa49a4a164450f1d5222bde2e4f50/ros2cli/ros2cli/node/daemon.py#L140-L149

On my system, resource.getrlimit(resource.RLIMIT_NOFILE) returns 1073741816 and it takes a long time to count that high!

However, it looks like a workaround for this has already been created here: https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 which made it into Jazzy.

It would be great if that could be backported to Humble and Iron!

sgvandijk avatar Jun 03 '24 13:06 sgvandijk

@sgvandijk thanks for posting the information, i was aware of that issue.

docker run -it osrf/ros:rolling-desktop

original post tells me this happens with rolling, so could be another issue because https://github.com/ros2/ros2cli/pull/888 is available with rolling and jazzy.

It would be great if that could be backported to Humble and Iron!

no objections for this.

fujitatomoya avatar Jun 03 '24 17:06 fujitatomoya

As a workaround, you can add --ulimit nofile=1024:1048576 to the docker run command:

docker run -it --ulimit nofile=1024:1048576 my-image

Or set default ulimits in /etc/docker/daemon.json:

{
    "default-ulimits": {
        "nofile": {
            "Name": "nofile",
            "Hard": 1048576,
            "Soft": 1024
        }
    }
}

Then restart docker daemon:

sudo systemctl restart docker

Note: These values are based on Ubuntu 22.04.

  • On an ubuntu 22.04 machine, the default nofile limits are 1024:1048576
  • In docker running on Ubuntu 22.04, these numbers are set to 1048576:1048576
  • However, in docker running on Manjaro, the default values are too large, which causes the program to stuck as @sgvandijk said above.

MirTITH avatar Jun 04 '24 07:06 MirTITH

It would be great if that could be backported to Humble and Iron!

backports to humble and iron are completed.

i still need to keep this open since original issue came from rolling, we should not meet this problem because https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 has been in rolling for a while.

@marekcygan can you confirm?

fujitatomoya avatar Jun 04 '24 16:06 fujitatomoya

Hello,

I am encountering the same issue with my devcontainer:

  • osrf/ros:iron-desktop
  • host: ubuntu 22.04
  • Docker version 26.1.4, build 5650f9b
  • uname -r : 6.8.11-200.fc39.x86_64

I am using ros2 with ros2 intelRealSense wrapper to use their depth cameras. I've run the docker without intelrealsense wrapper and i've still got the problem.

Thanks for your help!

MarinoAlpine avatar Jun 19 '24 16:06 MarinoAlpine

The issue no longer exists on my end. Sorry for late reply.

marekcygan avatar Jun 27 '24 11:06 marekcygan

I am facing similar issues on Fedora 40. List commands (e.g ros2 topic list ros2 node list) do not terminate. Only when run with --no-daemon they finish.

  • installed iron from source inside a ubuntu:jammy container
  • host: fedora 40
  • Docker version 26.1.4
  • containerd version: 1.6.33
  • uname -r: 6.9.5-200.fc40.x86_64

NovoG93 avatar Jun 27 '24 12:06 NovoG93

@NovoG93 and @MarinoAlpine Please provide the output of ros2doctor in the container where you see the behavior 🧇

sloretz avatar Jul 12 '24 18:07 sloretz

@sloretz I am facing the same issue with ros2 doctor, here however I can't use the --no-daemon argument. As I am writing this the `ros2 doctor process is running for more then 14 minutes:

user@b17e1cbb898b:~$ ps -aux | grep doctor
user    5115 99.7  0.8 1018592 199760 pts/1  Rl+  14:39  11:44 /usr/bin/python3 /opt/ros/iron/bin/ros2 doctor --report

I stopped it with ctr +c :

user@b17e1cbb898b:/workspaces/rclpy_basics/RCLPY_Basics$ { date; ros2 doctor --report; } || true; date
Sun Jul 14 14:39:57 UTC 2024
/opt/ros/iron/lib/python3.10/site-packages/ros2doctor/api/__init__.py: 154: UserWarning: Fail to call PackageReport class functions.
^CSun Jul 14 14:52:43 UTC 2024

NovoG93 avatar Jul 14 '24 15:07 NovoG93

@NovoG93 what version of the ros2cli repo did you build in your container? Mind posting the output of vcs export --exact from the colcon workspace inside your container?

sloretz avatar Jul 15 '24 23:07 sloretz

@sloretz wrong choice of words, I installed it with the binaries. Please find a minimal example of my dockerfile below:

ARG GRAPHICS_PLATFORM=standard
ARG BASE_PACKAGE=base
FROM ubuntu:jammy

ENV DEBIAN_FRONTEND="noninteractive"

# Install mesa for GUI
RUN apt-get update && apt-get install -q -y --no-install-recommends \
    libgl1-mesa-glx libgl1-mesa-dri
LABEL maintainer="Georg Novotny FHTW"

SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Install some basic packages
RUN apt-get update -qq && \
    apt-get upgrade -yq && \
    apt-get install -yq \
    wget curl git sudo lsb-release locales bash-completion tzdata gosu tmux xterm less bc \
        build-essential neovim htop psmisc \
        python3-pip --no-install-recommends && \
    rm -rf /var/lib/apt/lists/*


ARG ROS_DISTRO=iron
ARG BASE_PACKAGE=desktop

ENV DEBIAN_FRONTEND="noninteractive"
# setup environment
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8

RUN echo "--------------------------------------------------------------------------------------------------------------"
RUN echo "Installing ROS2 ${ROS_DISTRO}-base"
RUN echo "--------------------------------------------------------------------------------------------------------------"
RUN apt-get update -qq && \
    apt-get install -yq curl gnupg2 lsb-release && \
    curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg && \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo jammy) main" | tee /etc/apt/sources.list.d/ros2.list > /dev/null && \
    apt-get update -qq && apt-get upgrade -y && \
    apt-get install -yq ros-dev-tools ros-${ROS_DISTRO}-ros-base \
    python3-argcomplete python3-colcon-common-extensions \
    python3-rosdep python3-vcstool --no-install-recommends && \
    rosdep init && \
    rm -rf /var/lib/apt/lists/*

ENV DEBIAN_FRONTEND="noninteractive"
ENV USERNAME=ros_user
ARG USER_ID=1000
ARG GROUP_ID=1000

ARG ROS_DISTRO=iron
ARG BASE_PACKAGE=desktop
ENV ROS_WORKSPACE=/home/${USERNAME}/ros2_ws
ENV COLCON_HOME=/home/${USERNAME}/.colcon

# Create user
RUN groupadd --gid $GROUP_ID ${USERNAME} && \
        useradd --gid $GROUP_ID -m ${USERNAME} && \
        echo "${USERNAME}:${USERNAME}" | chpasswd && \
        usermod --shell /bin/bash ${USERNAME} && \
        usermod -aG sudo ${USERNAME} && \
        usermod  --uid $USER_ID ${USERNAME} && \
        echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/${USERNAME} && \
        chmod 0440 /etc/sudoers.d/${USERNAME}

SHELL ["/bin/bash", "-euo", "pipefail", "-c"]


# Install ROS packages
RUN apt-get update -qq && apt-get install -yq ros-${ROS_DISTRO}-nav2* ros-${ROS_DISTRO}-turtlebot3* ros-${ROS_DISTRO}-tf-transformations --no-install-recommends && \
    rm -rf /var/lib/apt/lists/*


RUN su ${USERNAME} -c "rosdep update"; \
    echo "export PATH=/home/${USERNAME}/.local/bin:${PATH}" && \
    echo "export TURTLEBOT3_MODEL=burger" >> /home/${USERNAME}/.bashrc && \
    echo "source /usr/share/colcon_argcomplete/hook/colcon-argcomplete.bash" >> /home/${USERNAME}/.bashrc && \
	source "/usr/share/colcon_cd/function/colcon_cd.sh" >> /home/${USERNAME}/.bashrc && \
    echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> /home/${USERNAME}/.bashrc && \
    echo "source ${ROS_WORKSPACE}/install/setup.bash" >> /home/${USERNAME}/.bashrc && \
    mkdir -p ${ROS_WORKSPACE}/src

# Initialize and update rosdep AS the user
RUN su ${USERNAME} -c "rosdep update"
 
WORKDIR ${ROS_WORKSPACE}
RUN /bin/bash -c "source /opt/ros/${ROS_DISTRO}/setup.bash  && colcon build --symlink-install" && \
    chown ${USERNAME}:${USERNAME} --recursive /home/${USERNAME}/ros2_ws && \
    chown ${USERNAME}:${USERNAME} /home/${USERNAME}/.bashrc

RUN chown -R ${USERNAME}:${USERNAME} /home/${USERNAME}


ENV USER=${USERNAME}
ENV ROS_DISTRO=${ROS_DISTRO}
ENV AMENT_PREFIX_PATH=/opt/ros/${ROS_DISTRO}
ENV COLCON_PREFIX_PATH=/opt/ros/${ROS_DISTRO}
ENV LD_LIBRARY_PATH=/opt/ros/${ROS_DISTRO}/lib
ENV PATH=/opt/ros/${ROS_DISTRO}/bin:$PATH
ENV PYTHONPATH=/opt/ros/${ROS_DISTRO}/lib/python3.10/site-packages
ENV ROS_PYTHON_VERSION=3
ENV ROS_VERSION=2
ENV DEBIAN_FRONTEND=
ENV SHELL=/bin/bash

WORKDIR ${ROS_WORKSPACE}/
COPY ./docker_install/terminal_entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && rm -rf /tmp/docker_install
RUN sed -i "s/DEFALTUSER/${USERNAME}/g" /entrypoint.sh
ENTRYPOINT [ "/entrypoint.sh" ]

entrypoint.sh:

#!/bin/bash

set -e

USERNAME=DEFALTUSER
DEFAULT_USER_ID=1000

#
# Ensure host and container have the same user ID. This is to allow both sides
# to read and write the shared directories.
#
if [ -v USER_ID ] && [ "$USER_ID" != "$DEFAULT_USER_ID" ]; then
    echo "Changing ${USERNAME} user ID to match your host's user ID ($USER_ID)." 
    echo "This operation can take a while..."

    usermod --uid $USER_ID ${USERNAME}

    # Ensure all files in the home directory are owned by the new user ID
    find /home/${USERNAME} -user $DEFAULT_USER_ID -exec chown -h $USER_ID {} \;
fi

cd /home/${USERNAME}

# If no command is provided, set bash to start interactive shell
if [ -z "$1" ]; then
    set - "/bin/bash" -l
fi

source "/opt/ros/$ROS_DISTRO/setup.bash" --
if [ -f /home/${USERNAME}/sim_ws/install/setup.bash ]; then
    source /home/${USERNAME}/sim_ws/install/setup.bash
fi
if [ -f /home/${USERNAME}/ros2_ws/install/setup.bash ]; then
    pushd /home/${USERNAME}/ros2_ws 
        rosdep install --from-paths src --ignore-src -r -y >/dev/null 2>&1 || true
        colcon build --symlink-install || true
    popd
    source /home/${USERNAME}/ros2_ws/install/setup.bash
fi

# Run the provided command using user '${USERNAME}'
exec gosu ${USERNAME} "$@"

NovoG93 avatar Jul 16 '24 16:07 NovoG93

@NovoG93

wrong choice of words, I installed it with the binaries. Please find a minimal example of my dockerfile below:

can you update ros packages? patch release that includes https://github.com/ros2/ros2cli/pull/907 has just released in this week, https://discourse.ros.org/t/preparing-for-iron-patch-and-sync-2023-07-14/32284

fujitatomoya avatar Jul 19 '24 20:07 fujitatomoya

@fujitatomoya I can verify that now with ros-iron-ros2cli=0.25.7-1jammy.20240711.235715 things work again like they should but with 0.25.6-1jammy.20240523.193235 they do not

NovoG93 avatar Jul 21 '24 17:07 NovoG93

It looks like the root cause was fixed in #888, and the fix is available in all active distros with #907 and #908, so I'll close this ticket. Thanks for reporting the issue!

sloretz avatar Aug 02 '24 16:08 sloretz

I am not sure if this is related or a separate issue, but I am having daemon issues on humble. It looks like the daemon won't automatically start but I can manually start it.

In order: ros2 topic list gets stuck, but ros2 topic list --no-daemon works correctly

ros2 daemon status hangs for about 60 seconds before throwing a connection timed out error.

If I run ros2 daemon start, then ros2 topic list and ros2 daemon status work as expected.

So it appears that the daemon is not automatically starting when it should. I am using the Debian with version 0.18.11-1jammy.20240728.221624, which I think should have been patched in #908

atyshka avatar Sep 11 '24 20:09 atyshka

@atyshka i would suggest you to create another issue with reproducible procedure.

as far as i checked, the problem cannot be observed.

root@tomoyafujita:/# dpkg -s ros-humble-ros2cli
Package: ros-humble-ros2cli
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 205
Maintainer: Aditya Pande <[email protected]>
Architecture: amd64
Version: 0.18.11-1jammy.20240728.221624
Depends: python3-argcomplete, python3-importlib-metadata, python3-netifaces, python3-packaging, python3-pkg-resources, ros-humble-rclpy, ros-humble-ros-workspace
Description: Framework for ROS 2 command line tools.
root@tomoyafujita:/# cd
root@tomoyafujita:~# source /opt/ros/humble/setup.bash
root@tomoyafujita:~# ros2 status
usage: ros2 [-h] [--use-python-default-buffering] Call `ros2 <command> -h` for more detailed usage. ...
ros2: error: argument Call `ros2 <command> -h` for more detailed usage.: invalid choice: 'status' (choose from 'action', 'bag', 'component', 'daemon', 'doctor', 'extension_points', 'extensions', 'interface', 'launch', 'lifecycle', 'multicast', 'node', 'param', 'pkg', 'run', 'security', 'service', 'topic', 'wtf')
root@tomoyafujita:~# dpkg -s ros-humble-ros2cli
Package: ros-humble-ros2cli
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 205
Maintainer: Aditya Pande <[email protected]>
Architecture: amd64
Version: 0.18.11-1jammy.20240728.221624
Depends: python3-argcomplete, python3-importlib-metadata, python3-netifaces, python3-packaging, python3-pkg-resources, ros-humble-rclpy, ros-humble-ros-workspace
Description: Framework for ROS 2 command line tools.
root@tomoyafujita:~# ros2 daemon status
The daemon is not running
root@tomoyafujita:~# ros2 topic list
/parameter_events
/rosout
root@tomoyafujita:~# ros2 daemon status
The daemon is running
root@tomoyafujita:~# ros2 daemon stop
The daemon has been stopped
root@tomoyafujita:~# ros2 daemon start
The daemon has been started
root@tomoyafujita:~# ros2 daemon status
The daemon is running

fujitatomoya avatar Sep 11 '24 23:09 fujitatomoya

@atyshka i would suggest you to create another issue with reproducible procedure.

as far as i checked, the problem cannot be observed.

This seems to be specific to WSL2. I have moved the discussion to #934 and added more details for reproducing.

atyshka avatar Sep 16 '24 19:09 atyshka