ros2cli
ros2cli copied to clipboard
ros2 topic list takes 10 minutes in a container (when run first time)
Bug report
Listing topics takes several minutes (10-20) when run the first time from the command line. During this time one core is used 100%.
Required Info:
- Operating System:
- Host operating system: Manjaro (also checked on Arch, surprisingly it works on Ubuntu which is probably why the issue is not wide-spread).
- docker version: Docker version 25.0.3, build 4debf411d1
- Installation type:
- Ros docker image (rolling-desktop, but also checked on iron and humble).
Steps to reproduce issue
docker run -it osrf/ros:rolling-desktop
(inside docker)
ros2 topic list
Expected behavior
Topics listed after a second.
Actual behavior
Command takes 10 minutes.
Hi @marekcygan, We did a brainstorming session on our weekly waffle triage meeting about this issue and here is our outcome.
- We need more information about the storage driver and about the host system itself.
- Could you please try to narrow down the problem by running with
--no-daemon? If the issue is gone - likely relates to the issue when the node graph collecting info about other nodes. - This issue might be related to the fix (patch) for the "Meltdown" vulnerability. It was a patches in the stdlib and linux kernel for that and we have seen significant performance degradation on some other platforms.
@MichaelOrlov thanks for your attention!
- Storage driver:
overlay 2(putting full docker info in a separate comment). Host system:
❯ uname -r
6.8.5-1-MANJARO
- I got an error when adding
--no-daemontoros2 topic list:
root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
Aborted (core dumped)
❯ docker info
Client:
Version: 25.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.13.0
Path: /usr/lib/docker/cli-plugins/docker-buildx
WARNING: Plugin "/home/marek/.docker/cli-plugins/docker-compose" is not valid: failed to fetch metadata: exit status 255
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 27
Server Version: 25.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8.m
runc version:
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.5-1-MANJARO
Operating System: Manjaro Linux
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.72GiB
Name: marek-rtx-3070
ID: B4RK:DIMQ:CKKZ:HOQD:QE3R:4IO6:V6GJ:LCPT:2ORO:W5I7:WM37:SM35
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@MichaelOrlov any ideas what should be the next steps?
@nuclearsandwich @wjwwood Any thoughts after providing details about running system configuration?
@marekcygan
so after,
docker run -it osrf/ros:rolling-desktop
this just hangs up forever,
ros2 topic list
but this generates the permission error?
root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
Aborted (core dumped)
that is really weird, and probably not related to ROS2...
a couple of things i would check,
### Did you source the ROS2 environment
root@cc8f329115ab:/# source /opt/ros/rolling/setup.bash
### Check the file and id ownership and permission
root@cc8f329115ab:/# which ros2
/opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2
-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# id -a
uid=0(root) gid=0(root) groups=0(root)
@fujitatomoya @MichaelOrlov
@marekcygan
so after,
docker run -it osrf/ros:rolling-desktopthis just hangs up forever,
Not forever, it takes 10 minutes to finish.
root@dd4078a0cd6b:/# time ros2 topic list
/parameter_events
/rosout
real 11m27.017s
user 9m53.996s
sys 1m32.153s
ros2 topic listbut this generates the permission error?
It used to, but now it does not, now it prints what it should immediately:
root@dd4078a0cd6b:/# ros2 topic list --no-daemon
/parameter_events
/rosout
a couple of things i would check, Did you source the ROS2 environment
Yes, otherwise I would not be able to run ros2 topic.
Check the file and id ownership and permission root@cc8f329115ab:/# which ros2
I get:
/opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2
-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# id -a
uid=0(root) gid=0(root) groups=0(root)
Not forever, it takes 10 minutes to finish.
can you stop the container and start it up, and the try following?
### login container and then
### check if ros2 daemon is running, expecting not running
ros2 daemon status
### ros2 command, expecting this takes 10 mins
ros2 topic list
### see if ros2 daemon is now running
ros2 daemon status
### ros2 command, to tell the problem is daemon spawning process or XMLRPC traffic.
ros2 topic list
if 2nd ros2 topic list responds quickly, the problem can be spawning process for ros2 daemon on your platform.
It used to, but now it does not, now it prints what it should immediately:
at least, this is relief. thanks for checking.
One more piece of information is that I have updated all the manjaro packages last week.
❯ docker --version
Docker version 26.1.1, build 4cf5afaefa
❯ uname -r
6.8.9-3-MANJARO
I also run into this issue on Manjaro. After digging a little bit I found that it gets stuck in this loop: https://github.com/ros2/ros2cli/blob/58b61c98378fa49a4a164450f1d5222bde2e4f50/ros2cli/ros2cli/node/daemon.py#L140-L149
On my system, resource.getrlimit(resource.RLIMIT_NOFILE) returns 1073741816 and it takes a long time to count that high!
However, it looks like a workaround for this has already been created here: https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 which made it into Jazzy.
It would be great if that could be backported to Humble and Iron!
@sgvandijk thanks for posting the information, i was aware of that issue.
docker run -it osrf/ros:rolling-desktop
original post tells me this happens with rolling, so could be another issue because https://github.com/ros2/ros2cli/pull/888 is available with rolling and jazzy.
It would be great if that could be backported to Humble and Iron!
no objections for this.
As a workaround, you can add --ulimit nofile=1024:1048576 to the docker run command:
docker run -it --ulimit nofile=1024:1048576 my-image
Or set default ulimits in /etc/docker/daemon.json:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 1048576,
"Soft": 1024
}
}
}
Then restart docker daemon:
sudo systemctl restart docker
Note: These values are based on Ubuntu 22.04.
- On an ubuntu 22.04 machine, the default nofile limits are 1024:1048576
- In docker running on Ubuntu 22.04, these numbers are set to 1048576:1048576
- However, in docker running on Manjaro, the default values are too large, which causes the program to stuck as @sgvandijk said above.
It would be great if that could be backported to Humble and Iron!
backports to humble and iron are completed.
i still need to keep this open since original issue came from rolling, we should not meet this problem because https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 has been in rolling for a while.
@marekcygan can you confirm?
Hello,
I am encountering the same issue with my devcontainer:
- osrf/ros:iron-desktop
- host: ubuntu 22.04
- Docker version 26.1.4, build 5650f9b
- uname -r : 6.8.11-200.fc39.x86_64
I am using ros2 with ros2 intelRealSense wrapper to use their depth cameras. I've run the docker without intelrealsense wrapper and i've still got the problem.
Thanks for your help!
The issue no longer exists on my end. Sorry for late reply.
I am facing similar issues on Fedora 40. List commands (e.g ros2 topic list ros2 node list) do not terminate. Only when run with --no-daemon they finish.
- installed iron from source inside a ubuntu:jammy container
- host: fedora 40
- Docker version 26.1.4
- containerd version: 1.6.33
- uname -r: 6.9.5-200.fc40.x86_64
@NovoG93 and @MarinoAlpine Please provide the output of ros2doctor in the container where you see the behavior 🧇
@sloretz I am facing the same issue with ros2 doctor, here however I can't use the --no-daemon argument.
As I am writing this the `ros2 doctor process is running for more then 14 minutes:
user@b17e1cbb898b:~$ ps -aux | grep doctor
user 5115 99.7 0.8 1018592 199760 pts/1 Rl+ 14:39 11:44 /usr/bin/python3 /opt/ros/iron/bin/ros2 doctor --report
I stopped it with ctr +c :
user@b17e1cbb898b:/workspaces/rclpy_basics/RCLPY_Basics$ { date; ros2 doctor --report; } || true; date
Sun Jul 14 14:39:57 UTC 2024
/opt/ros/iron/lib/python3.10/site-packages/ros2doctor/api/__init__.py: 154: UserWarning: Fail to call PackageReport class functions.
^CSun Jul 14 14:52:43 UTC 2024
@NovoG93 what version of the ros2cli repo did you build in your container? Mind posting the output of vcs export --exact from the colcon workspace inside your container?
@sloretz wrong choice of words, I installed it with the binaries. Please find a minimal example of my dockerfile below:
ARG GRAPHICS_PLATFORM=standard
ARG BASE_PACKAGE=base
FROM ubuntu:jammy
ENV DEBIAN_FRONTEND="noninteractive"
# Install mesa for GUI
RUN apt-get update && apt-get install -q -y --no-install-recommends \
libgl1-mesa-glx libgl1-mesa-dri
LABEL maintainer="Georg Novotny FHTW"
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Install some basic packages
RUN apt-get update -qq && \
apt-get upgrade -yq && \
apt-get install -yq \
wget curl git sudo lsb-release locales bash-completion tzdata gosu tmux xterm less bc \
build-essential neovim htop psmisc \
python3-pip --no-install-recommends && \
rm -rf /var/lib/apt/lists/*
ARG ROS_DISTRO=iron
ARG BASE_PACKAGE=desktop
ENV DEBIAN_FRONTEND="noninteractive"
# setup environment
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
RUN echo "--------------------------------------------------------------------------------------------------------------"
RUN echo "Installing ROS2 ${ROS_DISTRO}-base"
RUN echo "--------------------------------------------------------------------------------------------------------------"
RUN apt-get update -qq && \
apt-get install -yq curl gnupg2 lsb-release && \
curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg && \
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo jammy) main" | tee /etc/apt/sources.list.d/ros2.list > /dev/null && \
apt-get update -qq && apt-get upgrade -y && \
apt-get install -yq ros-dev-tools ros-${ROS_DISTRO}-ros-base \
python3-argcomplete python3-colcon-common-extensions \
python3-rosdep python3-vcstool --no-install-recommends && \
rosdep init && \
rm -rf /var/lib/apt/lists/*
ENV DEBIAN_FRONTEND="noninteractive"
ENV USERNAME=ros_user
ARG USER_ID=1000
ARG GROUP_ID=1000
ARG ROS_DISTRO=iron
ARG BASE_PACKAGE=desktop
ENV ROS_WORKSPACE=/home/${USERNAME}/ros2_ws
ENV COLCON_HOME=/home/${USERNAME}/.colcon
# Create user
RUN groupadd --gid $GROUP_ID ${USERNAME} && \
useradd --gid $GROUP_ID -m ${USERNAME} && \
echo "${USERNAME}:${USERNAME}" | chpasswd && \
usermod --shell /bin/bash ${USERNAME} && \
usermod -aG sudo ${USERNAME} && \
usermod --uid $USER_ID ${USERNAME} && \
echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/${USERNAME} && \
chmod 0440 /etc/sudoers.d/${USERNAME}
SHELL ["/bin/bash", "-euo", "pipefail", "-c"]
# Install ROS packages
RUN apt-get update -qq && apt-get install -yq ros-${ROS_DISTRO}-nav2* ros-${ROS_DISTRO}-turtlebot3* ros-${ROS_DISTRO}-tf-transformations --no-install-recommends && \
rm -rf /var/lib/apt/lists/*
RUN su ${USERNAME} -c "rosdep update"; \
echo "export PATH=/home/${USERNAME}/.local/bin:${PATH}" && \
echo "export TURTLEBOT3_MODEL=burger" >> /home/${USERNAME}/.bashrc && \
echo "source /usr/share/colcon_argcomplete/hook/colcon-argcomplete.bash" >> /home/${USERNAME}/.bashrc && \
source "/usr/share/colcon_cd/function/colcon_cd.sh" >> /home/${USERNAME}/.bashrc && \
echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> /home/${USERNAME}/.bashrc && \
echo "source ${ROS_WORKSPACE}/install/setup.bash" >> /home/${USERNAME}/.bashrc && \
mkdir -p ${ROS_WORKSPACE}/src
# Initialize and update rosdep AS the user
RUN su ${USERNAME} -c "rosdep update"
WORKDIR ${ROS_WORKSPACE}
RUN /bin/bash -c "source /opt/ros/${ROS_DISTRO}/setup.bash && colcon build --symlink-install" && \
chown ${USERNAME}:${USERNAME} --recursive /home/${USERNAME}/ros2_ws && \
chown ${USERNAME}:${USERNAME} /home/${USERNAME}/.bashrc
RUN chown -R ${USERNAME}:${USERNAME} /home/${USERNAME}
ENV USER=${USERNAME}
ENV ROS_DISTRO=${ROS_DISTRO}
ENV AMENT_PREFIX_PATH=/opt/ros/${ROS_DISTRO}
ENV COLCON_PREFIX_PATH=/opt/ros/${ROS_DISTRO}
ENV LD_LIBRARY_PATH=/opt/ros/${ROS_DISTRO}/lib
ENV PATH=/opt/ros/${ROS_DISTRO}/bin:$PATH
ENV PYTHONPATH=/opt/ros/${ROS_DISTRO}/lib/python3.10/site-packages
ENV ROS_PYTHON_VERSION=3
ENV ROS_VERSION=2
ENV DEBIAN_FRONTEND=
ENV SHELL=/bin/bash
WORKDIR ${ROS_WORKSPACE}/
COPY ./docker_install/terminal_entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && rm -rf /tmp/docker_install
RUN sed -i "s/DEFALTUSER/${USERNAME}/g" /entrypoint.sh
ENTRYPOINT [ "/entrypoint.sh" ]
entrypoint.sh:
#!/bin/bash
set -e
USERNAME=DEFALTUSER
DEFAULT_USER_ID=1000
#
# Ensure host and container have the same user ID. This is to allow both sides
# to read and write the shared directories.
#
if [ -v USER_ID ] && [ "$USER_ID" != "$DEFAULT_USER_ID" ]; then
echo "Changing ${USERNAME} user ID to match your host's user ID ($USER_ID)."
echo "This operation can take a while..."
usermod --uid $USER_ID ${USERNAME}
# Ensure all files in the home directory are owned by the new user ID
find /home/${USERNAME} -user $DEFAULT_USER_ID -exec chown -h $USER_ID {} \;
fi
cd /home/${USERNAME}
# If no command is provided, set bash to start interactive shell
if [ -z "$1" ]; then
set - "/bin/bash" -l
fi
source "/opt/ros/$ROS_DISTRO/setup.bash" --
if [ -f /home/${USERNAME}/sim_ws/install/setup.bash ]; then
source /home/${USERNAME}/sim_ws/install/setup.bash
fi
if [ -f /home/${USERNAME}/ros2_ws/install/setup.bash ]; then
pushd /home/${USERNAME}/ros2_ws
rosdep install --from-paths src --ignore-src -r -y >/dev/null 2>&1 || true
colcon build --symlink-install || true
popd
source /home/${USERNAME}/ros2_ws/install/setup.bash
fi
# Run the provided command using user '${USERNAME}'
exec gosu ${USERNAME} "$@"
@NovoG93
wrong choice of words, I installed it with the binaries. Please find a minimal example of my dockerfile below:
can you update ros packages? patch release that includes https://github.com/ros2/ros2cli/pull/907 has just released in this week, https://discourse.ros.org/t/preparing-for-iron-patch-and-sync-2023-07-14/32284
@fujitatomoya I can verify that now with ros-iron-ros2cli=0.25.7-1jammy.20240711.235715 things work again like they should but with 0.25.6-1jammy.20240523.193235 they do not
It looks like the root cause was fixed in #888, and the fix is available in all active distros with #907 and #908, so I'll close this ticket. Thanks for reporting the issue!
I am not sure if this is related or a separate issue, but I am having daemon issues on humble. It looks like the daemon won't automatically start but I can manually start it.
In order:
ros2 topic list gets stuck, but ros2 topic list --no-daemon works correctly
ros2 daemon status hangs for about 60 seconds before throwing a connection timed out error.
If I run ros2 daemon start, then ros2 topic list and ros2 daemon status work as expected.
So it appears that the daemon is not automatically starting when it should. I am using the Debian with version 0.18.11-1jammy.20240728.221624, which I think should have been patched in #908
@atyshka i would suggest you to create another issue with reproducible procedure.
as far as i checked, the problem cannot be observed.
root@tomoyafujita:/# dpkg -s ros-humble-ros2cli
Package: ros-humble-ros2cli
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 205
Maintainer: Aditya Pande <[email protected]>
Architecture: amd64
Version: 0.18.11-1jammy.20240728.221624
Depends: python3-argcomplete, python3-importlib-metadata, python3-netifaces, python3-packaging, python3-pkg-resources, ros-humble-rclpy, ros-humble-ros-workspace
Description: Framework for ROS 2 command line tools.
root@tomoyafujita:/# cd
root@tomoyafujita:~# source /opt/ros/humble/setup.bash
root@tomoyafujita:~# ros2 status
usage: ros2 [-h] [--use-python-default-buffering] Call `ros2 <command> -h` for more detailed usage. ...
ros2: error: argument Call `ros2 <command> -h` for more detailed usage.: invalid choice: 'status' (choose from 'action', 'bag', 'component', 'daemon', 'doctor', 'extension_points', 'extensions', 'interface', 'launch', 'lifecycle', 'multicast', 'node', 'param', 'pkg', 'run', 'security', 'service', 'topic', 'wtf')
root@tomoyafujita:~# dpkg -s ros-humble-ros2cli
Package: ros-humble-ros2cli
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 205
Maintainer: Aditya Pande <[email protected]>
Architecture: amd64
Version: 0.18.11-1jammy.20240728.221624
Depends: python3-argcomplete, python3-importlib-metadata, python3-netifaces, python3-packaging, python3-pkg-resources, ros-humble-rclpy, ros-humble-ros-workspace
Description: Framework for ROS 2 command line tools.
root@tomoyafujita:~# ros2 daemon status
The daemon is not running
root@tomoyafujita:~# ros2 topic list
/parameter_events
/rosout
root@tomoyafujita:~# ros2 daemon status
The daemon is running
root@tomoyafujita:~# ros2 daemon stop
The daemon has been stopped
root@tomoyafujita:~# ros2 daemon start
The daemon has been started
root@tomoyafujita:~# ros2 daemon status
The daemon is running
@atyshka i would suggest you to create another issue with reproducible procedure.
as far as i checked, the problem cannot be observed.
This seems to be specific to WSL2. I have moved the discussion to #934 and added more details for reproducing.