ouster-ros icon indicating copy to clipboard operation
ouster-ros copied to clipboard

Ouster (sensor.launch) Retry connecting to sensor instead of crashing

Open hirenpatel1207 opened this issue 2 years ago • 8 comments

Hi,

Right now if we do "roslaunch sensor.launch" and if the sensor is not connected to the systems (lets say it was not powered on), then the launch file crashes. Is there a way to retry instead of crashing? The launch file is used with other sensor (like a camera, imu etc) and crashing of this file causes the whole system to crash.

hirenpatel1207 avatar Jan 27 '23 10:01 hirenpatel1207

Hi @hirenpatel1207, there is plans to support similar things in the future but I wanted to check if you happen to try out the node attributes required and respawn and saw if they would help in your case, basically setting required to false and respawn to true on the sensor nodelet (or the nodelet manager).

Samahu avatar Jan 27 '23 17:01 Samahu

Hi @Samahu , I tried setting required to false and respawn to true on os_nodelet_mgr and/or os_node (tried different combinations). But none of them worked. Following is my output:

[ INFO] [1675066044.701190001]: Initializing nodelet with 16 worker threads.
[ INFO] [1675066045.697216986]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:
[ INFO] [1675066045.728023349]: Will use automatic UDP destination

[ INFO] [1675066046.696662483]: Loading nodelet /ouster/os_cloud_node of type nodelets_os/OusterCloud to manager os_nodelet_mgr with the following remappings:
[ INFO] [1675066046.703412536]: Loading nodelet /ouster/img_node of type nodelets_os/OusterImage to manager os_nodelet_mgr with the following remappings:
[ERROR] [1675066080.408690431]: Error setting config:  CurlClient::execute_get failed for the url: [http://os-122251000958.local/api/v1/system/firmware] with the error message: Couldn't connect to server
[FATAL] [1675066080.409086889]: Failed to load nodelet '/ouster/os_node` of type `nodelets_os/OusterSensor` to manager `os_nodelet_mgr'
[ INFO] [1675066080.414468138]: waitForService: Service [/ouster/get_metadata] has not been advertised, waiting...
[ouster/os_node-2] process has died [pid 20702, exit code 255, cmd bash -c sleep 3; $0 $@ /opt/ros/noetic/lib/nodelet/nodelet load nodelets_os/OusterSensor os_nodelet_mgr __name:=os_node __log:=/home/hp/.ros/log/171f5714-a074-11ed-a8f1-bb29c81e7172/ouster-os_node-2.log].
log file: /home/hp/.ros/log/171f5714-a074-11ed-a8f1-bb29c81e7172/ouster-os_node-2*.log
[ouster/os_node-2] restarting process
process[ouster/os_node-2]: started with pid [20777]
[ INFO] [1675066083.572650050]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:

The sensor was not ON when I started the launch file. Then I waited for sometime and turned it ON. After there there is still no other output on the terminal.

hirenpatel1207 avatar Jan 30 '23 08:01 hirenpatel1207

@hirenpatel1207 I did attempt to utilize required and respawn tags that I suggested to let the driver reconnect to the server by respawning upon failure, however, I experienced the same behavior that you reported. I think this is not working mainly due to our use of ROS nodelets, I think that ROS nodelets do not handle these tags properly as they are supposed to. I observed that using these tags ROS attempts to respawn the sensor nodelet after crash but unfortunately it does not continue with the nodelet initialization. This being the case we will probably need to handle reconnection through the sensor itself and we probably need to add a new parameter too. I will log a ticket for this item but I can't give you time on when I will have this implemented. I will keep this ticket open until we have some fix.

Samahu avatar Jan 31 '23 04:01 Samahu

Same problem here, any updates?

gustavojoseleite avatar Mar 10 '23 19:03 gustavojoseleite

Hi,

I tried this solution and look works. I modified the configure_sensor method in os_sensor_nodelet.cpp to return a bool that indicates whether the configuration has made successfully or not.

bool configure_sensor(const std::string& hostname, sensor::sensor_config& config, int config_flags) {
        bool is_configured = false;
        if (config.udp_dest && sensor::in_multicast(config.udp_dest.value()) && !mtp_main) {
            if (!get_config(hostname, config, true)) {
                NODELET_ERROR("Error getting active config");
            }
            NODELET_INFO("Retrived active config of sensor");
            return is_configured;
        }
        try {
            if(!set_config(hostname, config, config_flags)) {
                auto err_msg = "Error connecting to sensor " + hostname;
                NODELET_ERROR_STREAM(err_msg);
                return is_configured;
            }
            is_configured = true;
        } catch (const std::exception& e) {
            NODELET_ERROR("Error setting config:  %s", e.what());
            is_configured = false;
            return is_configured;
        }
        NODELET_INFO_STREAM("Sensor " << hostname << " was configured successfully");
        return is_configured;
}

So, with this i changed this onInit() method adding a while method to block until the sensor has been configured:

virtual void onInit() override {
    auto& pnh = getPrivateNodeHandle();
    sensor_hostname = get_sensor_hostname(pnh);
    sensor::sensor_config config;
    uint8_t flags;
    std::tie(config, flags) = create_sensor_config_rosparams(pnh);
    while(!configure_sensor(sensor_hostname, config, flags));
    sensor_client = create_sensor_client(sensor_hostname, config);
    update_config_and_metadata(*sensor_client);
    save_metadata(pnh);
    OusterClientBase::onInit();
    create_get_config_service();
    create_set_config_service();
    start_connection_loop();
}

What do you think about this @Samahu, could be useful?

gustavojoseleite avatar Mar 11 '23 12:03 gustavojoseleite

I modified @gustavojoseleite's modifications to instead be:

    bool configure_sensor(const std::string& hostname,
                          sensor::sensor_config& config,
                          const int config_flags,
                          const bool retry_configuration) {
        bool is_configured = false;
        bool is_first_attempt = true;
        do {
            // Throttling
            if (!is_first_attempt) {
                ros::Duration(0.01).sleep();
                ros::spinOnce();
            }

            if (config.udp_dest &&
                sensor::in_multicast(config.udp_dest.value()) &&
                !mtp_main) {
                if (!get_config(hostname, config, true)) {
                    NODELET_ERROR("Error getting active config");
                } else {
                    NODELET_INFO("Retrieved active config of sensor");
                }
            } else {
                try {
                    if (!set_config(hostname, config, config_flags)) {
                        auto err_msg = "Error connecting to sensor " + hostname;
                        NODELET_ERROR_STREAM(err_msg);
                    } else {
                        is_configured = true;
                    }
                } catch (const std::exception& e) {
                    NODELET_ERROR("Error setting config:  %s", e.what());
                }
            }

            is_first_attempt = false;
        } while (!is_configured && retry_configuration);

        NODELET_INFO_STREAM("Sensor " << hostname
                                      << " was "
                                      << (is_configured ? "" : "not ") << "configured successfully");

        return is_configured;
    }

and consequently:

    virtual void onInit() override {
        auto& pnh = getPrivateNodeHandle();
        sensor_hostname = get_sensor_hostname(pnh);
        bool retry_configuration = false;
        std::tie(config, flags) = create_sensor_config_rosparams(pnh, retry_configuration);
        configure_sensor(sensor_hostname, config, flags, retry_configuration);
        sensor_client = create_sensor_client(sensor_hostname, config);
        auto& nh = getNodeHandle();
        create_metadata_publisher(nh);
        update_config_and_metadata(*sensor_client);
        publish_metadata();
        save_metadata(pnh);
        create_get_metadata_service(nh);
        create_get_config_service(nh);
        create_set_config_service(nh);
        start_connection_loop(nh);
    }

(I added a retry_configuration argument, default false, to the launch files, which can be set to true to expose this behaviour)

I did this because then I can expand the member variables to:

   private:
    PacketMsg lidar_packet;
    PacketMsg imu_packet;
    ros::Publisher lidar_packet_pub;
    ros::Publisher imu_packet_pub;
    std::shared_ptr<sensor::client> sensor_client;
    ros::Timer timer_;
    std::string sensor_hostname;
    ros::ServiceServer get_config_srv;
    ros::ServiceServer set_config_srv;
    ros::Time first_lidar_data_rx = ros::Time(0.0);
    std::string cached_config;
    std::string mtp_dest;
    bool mtp_main;
    bool had_reconnection_success = false;
    uint8_t flags;
    sensor::sensor_config config;

and then

    void connection_loop(sensor::client& cli, const sensor::packet_format& pf) {
        auto state = sensor::poll_client(cli);
        if (state == sensor::EXIT) {
            NODELET_INFO("poll_client: caught signal, exiting");
            return;
        }
        if (state & sensor::CLIENT_ERROR) {
            NODELET_ERROR("poll_client: returned error");
            return;
        }
        if (state & sensor::LIDAR_DATA) {
            if (sensor::read_lidar_packet(cli, lidar_packet.buf.data(), pf)) {
                had_reconnection_success = false;
                first_lidar_data_rx = ros::Time::now();
                lidar_packet_pub.publish(lidar_packet);
            }
        }
        else if (!cached_config.empty() &&
                 !had_reconnection_success &&
                 (first_lidar_data_rx != ros::Time(0.0)) &&
                 (ros::Time::now().toSec() - first_lidar_data_rx.toSec()) > 3.0) {
            NODELET_ERROR("poll_client: attempting reconnection");
            had_reconnection_success = configure_sensor(sensor_hostname, config, flags, false);

            if (had_reconnection_success) {
                sensor_client = create_sensor_client(sensor_hostname, config);
            }
        }
        if (state & sensor::IMU_DATA) {
            if (sensor::read_imu_packet(cli, imu_packet.buf.data(), pf))
                imu_packet_pub.publish(imu_packet);
        }
    }

which allows re-connections to the sensor not just if it is unpowered on initialisation, but also if it is disconnected and re-connected during runtime

tombrodie avatar Apr 28 '23 01:04 tombrodie

I got the following error when I tried to launch sensor.launch. Any solution yet? Thanks in advance

[FATAL] [1693665521.636908912]: Failed to load nodelet '/ouster/os_driverof typeouster_ros/OusterDriverto manageros_nodelet_mgr' [ouster/os_driver-3] process has died [pid 11682, exit code 255, cmd /opt/ros/noetic/lib/nodelet/nodelet load ouster_ros/OusterDriver os_nodelet_mgr __name:=os_driver __log:=/home/Kreypo/.ros/log/64b883dc-499e-11ee-8e40-b1b3f24513bb/ouster-os_driver-3.log].

LinusNEP avatar Sep 02 '23 14:09 LinusNEP

@LinusNEP I don't think what you are reporting is related to this thread (which is about trying to connect while sensor is not connected or up yet), I would suggest that you first lookup all other issues first for the same issue you are getting, and if none match then start a new issue. also it would be better to include the rest of the log because what you have there is not sufficient.

Samahu avatar Sep 02 '23 19:09 Samahu