apollo icon indicating copy to clipboard operation
apollo copied to clipboard

Poor performance for CyberRT in different docker containers on a single host pc.

Open EternalSaga opened this issue 1 year ago • 6 comments

This issue reports an abnormal network performance problem when running CyberRT in Docker. I moved CyberRT into docker which uses CMake as the build tool according to this repo. Then I test the network performance through the cyber_recorder tool. Unfortunately, there're wired performant issues for cyberrt between two cyber containers.

System information

  • OS Platform and Distribution1st: Windows11 with wsl2 ubuntu22.04 and the docker desktop with wsl2 backend.
  • OS Platform and Distribution2nd: Proxmox Virtual Environmnet (KVM) based ubuntu22.04 and docker engine.
  • Apollo version:8.0
  • Hardware Info: i9 13900K, pcie 4.0 m2 ssd 64GB memory.

Steps for reproduce this issue.

This reproduction is based on standard apollo container which is built by bazel.

  1. Launch two standard Docker containers using docker compose. Here is an example docker compose file. Pay attention to containers' names and image names.
# 在这个例子里,我们使用docker compose创建了两个容器,并将这两个容器放在同一个网络中,同时给容器指定了ip
# 手动添加有关cyberrt ip的环境变量,进入容器后不用重新设置
networks:
  cyber_net:
    ipam:
      config:
        - subnet: 192.168.50.0/24
          gateway: 192.168.50.1

services:
  cyber_1:
    image: "apolloauto/apollo:dev-x86_64-18.04-20240204_0555"#注意镜像名称
    container_name: apollo_raw_container
    volumes:
      - /home/robin/cpp_proj/apollo:/apollo #进行数据卷挂载,请修改
      - /home/robin/.ssh:/root/.ssh #挂载你的ssh信息
      - audio_volume:/apollo/modules/audio/data/
      - tl_detection_volume:/apollo/modules/perception/production/data/perception/camera/models/traffic_light_detection/tl_detection_caffe
      - map_volume:/apollo/modules/map/data/sunnyvale_loop
    entrypoint: /bin/bash
    networks:
      cyber_net:
        ipv4_address: 192.168.50.2
    environment:
      - CYBER_IP=192.168.50.2
    stdin_open: true # docker run -i
    tty: true        # docker run -t
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu] #GPU支持的官方文档https://docs.docker.com/compose/gpu-support/
  cyber_2:
    image: "apolloauto/apollo:dev-x86_64-18.04-20240204_0555"
    container_name: apollo_raw_container2
    stdin_open: true # docker run -i
    tty: true        # docker run -t
    ports:
      - 8888:8888 # 映射容器内端口到宿主机
    volumes:
      - /home/robin/cpp_proj/apollo:/apollo #进行数据卷挂载,请修改
      - /home/robin/.ssh:/root/.ssh #挂载你的ssh信息
      - audio_volume:/apollo/modules/audio/data/
      - tl_detection_volume:/apollo/modules/perception/production/data/perception/camera/models/traffic_light_detection/tl_detection_caffe
      - map_volume:/apollo/modules/map/data/sunnyvale_loop
    networks:
      cyber_net:
        ipv4_address: 192.168.50.3
    environment:
      - CYBER_IP=192.168.50.3
  
  audio_asserts:
    image: apolloauto/apollo:data_volume-audio_model-x86_64-latest #注意镜像名称
    volumes:
      - audio_volume:/apollo/modules/audio/data/

  traffic_light_det_assert:
    image: apolloauto/apollo:traffic_light-detection_caffe_model-x86_64-latest
    volumes:
      - tl_detection_volume:/apollo/modules/perception/production/data/perception/camera/models/traffic_light_detection/tl_detection_caffe

  map_assert:
    image: apolloauto/apollo:map_volume-sunnyvale_loop-latest
    volumes:
      - map_volume:/apollo/modules/map/data/sunnyvale_loop
volumes:
  audio_volume:
  tl_detection_volume:
  map_volume:
    
  1. Build apollo and launch the cyber_recorder to play the demo record in the first container. Then use the cyber_moniter in the second container.
# enter container 2
docker exec -it apollo_raw_container2 bash
# play the demo record
./bazel-bin/cyber/tools/cyber_recorder/cyber_recorder play -l -f ./sensor_rgb.record 
# enter container 1
docker exec -it apollo_raw_container bash
# launch moniter
./bazel-bin/cyber/tools/cyber_monitor/cyber_monitor

  1. The data transmission frequency between the two Cyber containers is extremely low, far below expectations.

Screenshot 2024-04-01 165120

EternalSaga avatar Apr 01 '24 08:04 EternalSaga

Can you determine whether it is a problem with cmake or a problem with bazel?

We will first try to test the bazel compilation situation

daohu527 avatar Apr 01 '24 12:04 daohu527

Can you determine whether it is a problem with cmake or a problem with bazel?

We will first try to test the bazel compilation situation

This problem appears both on bazel and cmake.

EternalSaga avatar Apr 01 '24 19:04 EternalSaga

@daohu527 I also tested it on a kvm based ubuntu22.04 virtual machine. The CyberRT performance is still bad. Screenshot 2024-04-03 102802

EternalSaga avatar Apr 03 '24 02:04 EternalSaga

hello ,Has the problem been resolved?

zhangjianming0724 avatar Jul 16 '24 06:07 zhangjianming0724

hello ,Has the problem been resolved?

Unfortunately, no. Have you meet the same problem?

EternalSaga avatar Jul 16 '24 07:07 EternalSaga

We need to deploy multiple cyberRT containers to one host for communication in a production environment

zhangjianming0724 avatar Jul 18 '24 03:07 zhangjianming0724