ouster-ros
ouster-ros copied to clipboard
Ouster (sensor.launch) Retry connecting to sensor instead of crashing
Hi,
Right now if we do "roslaunch sensor.launch" and if the sensor is not connected to the systems (lets say it was not powered on), then the launch file crashes. Is there a way to retry instead of crashing? The launch file is used with other sensor (like a camera, imu etc) and crashing of this file causes the whole system to crash.
Hi @hirenpatel1207, there is plans to support similar things in the future but I wanted to check if you happen to try out the node attributes required
and respawn
and saw if they would help in your case, basically setting required
to false
and respawn
to true
on the sensor nodelet (or the nodelet manager).
Hi @Samahu ,
I tried setting required
to false
and respawn
to true
on os_nodelet_mgr
and/or os_node
(tried different combinations). But none of them worked. Following is my output:
[ INFO] [1675066044.701190001]: Initializing nodelet with 16 worker threads.
[ INFO] [1675066045.697216986]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:
[ INFO] [1675066045.728023349]: Will use automatic UDP destination
[ INFO] [1675066046.696662483]: Loading nodelet /ouster/os_cloud_node of type nodelets_os/OusterCloud to manager os_nodelet_mgr with the following remappings:
[ INFO] [1675066046.703412536]: Loading nodelet /ouster/img_node of type nodelets_os/OusterImage to manager os_nodelet_mgr with the following remappings:
[ERROR] [1675066080.408690431]: Error setting config: CurlClient::execute_get failed for the url: [http://os-122251000958.local/api/v1/system/firmware] with the error message: Couldn't connect to server
[FATAL] [1675066080.409086889]: Failed to load nodelet '/ouster/os_node` of type `nodelets_os/OusterSensor` to manager `os_nodelet_mgr'
[ INFO] [1675066080.414468138]: waitForService: Service [/ouster/get_metadata] has not been advertised, waiting...
[ouster/os_node-2] process has died [pid 20702, exit code 255, cmd bash -c sleep 3; $0 $@ /opt/ros/noetic/lib/nodelet/nodelet load nodelets_os/OusterSensor os_nodelet_mgr __name:=os_node __log:=/home/hp/.ros/log/171f5714-a074-11ed-a8f1-bb29c81e7172/ouster-os_node-2.log].
log file: /home/hp/.ros/log/171f5714-a074-11ed-a8f1-bb29c81e7172/ouster-os_node-2*.log
[ouster/os_node-2] restarting process
process[ouster/os_node-2]: started with pid [20777]
[ INFO] [1675066083.572650050]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:
The sensor was not ON when I started the launch file. Then I waited for sometime and turned it ON. After there there is still no other output on the terminal.
@hirenpatel1207 I did attempt to utilize required and respawn tags that I suggested to let the driver reconnect to the server by respawning upon failure, however, I experienced the same behavior that you reported. I think this is not working mainly due to our use of ROS nodelets, I think that ROS nodelets do not handle these tags properly as they are supposed to. I observed that using these tags ROS attempts to respawn the sensor nodelet after crash but unfortunately it does not continue with the nodelet initialization. This being the case we will probably need to handle reconnection through the sensor itself and we probably need to add a new parameter too. I will log a ticket for this item but I can't give you time on when I will have this implemented. I will keep this ticket open until we have some fix.
Same problem here, any updates?
Hi,
I tried this solution and look works. I modified the configure_sensor method in os_sensor_nodelet.cpp to return a bool that indicates whether the configuration has made successfully or not.
bool configure_sensor(const std::string& hostname, sensor::sensor_config& config, int config_flags) {
bool is_configured = false;
if (config.udp_dest && sensor::in_multicast(config.udp_dest.value()) && !mtp_main) {
if (!get_config(hostname, config, true)) {
NODELET_ERROR("Error getting active config");
}
NODELET_INFO("Retrived active config of sensor");
return is_configured;
}
try {
if(!set_config(hostname, config, config_flags)) {
auto err_msg = "Error connecting to sensor " + hostname;
NODELET_ERROR_STREAM(err_msg);
return is_configured;
}
is_configured = true;
} catch (const std::exception& e) {
NODELET_ERROR("Error setting config: %s", e.what());
is_configured = false;
return is_configured;
}
NODELET_INFO_STREAM("Sensor " << hostname << " was configured successfully");
return is_configured;
}
So, with this i changed this onInit() method adding a while method to block until the sensor has been configured:
virtual void onInit() override {
auto& pnh = getPrivateNodeHandle();
sensor_hostname = get_sensor_hostname(pnh);
sensor::sensor_config config;
uint8_t flags;
std::tie(config, flags) = create_sensor_config_rosparams(pnh);
while(!configure_sensor(sensor_hostname, config, flags));
sensor_client = create_sensor_client(sensor_hostname, config);
update_config_and_metadata(*sensor_client);
save_metadata(pnh);
OusterClientBase::onInit();
create_get_config_service();
create_set_config_service();
start_connection_loop();
}
What do you think about this @Samahu, could be useful?
I modified @gustavojoseleite's modifications to instead be:
bool configure_sensor(const std::string& hostname,
sensor::sensor_config& config,
const int config_flags,
const bool retry_configuration) {
bool is_configured = false;
bool is_first_attempt = true;
do {
// Throttling
if (!is_first_attempt) {
ros::Duration(0.01).sleep();
ros::spinOnce();
}
if (config.udp_dest &&
sensor::in_multicast(config.udp_dest.value()) &&
!mtp_main) {
if (!get_config(hostname, config, true)) {
NODELET_ERROR("Error getting active config");
} else {
NODELET_INFO("Retrieved active config of sensor");
}
} else {
try {
if (!set_config(hostname, config, config_flags)) {
auto err_msg = "Error connecting to sensor " + hostname;
NODELET_ERROR_STREAM(err_msg);
} else {
is_configured = true;
}
} catch (const std::exception& e) {
NODELET_ERROR("Error setting config: %s", e.what());
}
}
is_first_attempt = false;
} while (!is_configured && retry_configuration);
NODELET_INFO_STREAM("Sensor " << hostname
<< " was "
<< (is_configured ? "" : "not ") << "configured successfully");
return is_configured;
}
and consequently:
virtual void onInit() override {
auto& pnh = getPrivateNodeHandle();
sensor_hostname = get_sensor_hostname(pnh);
bool retry_configuration = false;
std::tie(config, flags) = create_sensor_config_rosparams(pnh, retry_configuration);
configure_sensor(sensor_hostname, config, flags, retry_configuration);
sensor_client = create_sensor_client(sensor_hostname, config);
auto& nh = getNodeHandle();
create_metadata_publisher(nh);
update_config_and_metadata(*sensor_client);
publish_metadata();
save_metadata(pnh);
create_get_metadata_service(nh);
create_get_config_service(nh);
create_set_config_service(nh);
start_connection_loop(nh);
}
(I added a retry_configuration
argument, default false
, to the launch files, which can be set to true
to expose this behaviour)
I did this because then I can expand the member variables to:
private:
PacketMsg lidar_packet;
PacketMsg imu_packet;
ros::Publisher lidar_packet_pub;
ros::Publisher imu_packet_pub;
std::shared_ptr<sensor::client> sensor_client;
ros::Timer timer_;
std::string sensor_hostname;
ros::ServiceServer get_config_srv;
ros::ServiceServer set_config_srv;
ros::Time first_lidar_data_rx = ros::Time(0.0);
std::string cached_config;
std::string mtp_dest;
bool mtp_main;
bool had_reconnection_success = false;
uint8_t flags;
sensor::sensor_config config;
and then
void connection_loop(sensor::client& cli, const sensor::packet_format& pf) {
auto state = sensor::poll_client(cli);
if (state == sensor::EXIT) {
NODELET_INFO("poll_client: caught signal, exiting");
return;
}
if (state & sensor::CLIENT_ERROR) {
NODELET_ERROR("poll_client: returned error");
return;
}
if (state & sensor::LIDAR_DATA) {
if (sensor::read_lidar_packet(cli, lidar_packet.buf.data(), pf)) {
had_reconnection_success = false;
first_lidar_data_rx = ros::Time::now();
lidar_packet_pub.publish(lidar_packet);
}
}
else if (!cached_config.empty() &&
!had_reconnection_success &&
(first_lidar_data_rx != ros::Time(0.0)) &&
(ros::Time::now().toSec() - first_lidar_data_rx.toSec()) > 3.0) {
NODELET_ERROR("poll_client: attempting reconnection");
had_reconnection_success = configure_sensor(sensor_hostname, config, flags, false);
if (had_reconnection_success) {
sensor_client = create_sensor_client(sensor_hostname, config);
}
}
if (state & sensor::IMU_DATA) {
if (sensor::read_imu_packet(cli, imu_packet.buf.data(), pf))
imu_packet_pub.publish(imu_packet);
}
}
which allows re-connections to the sensor not just if it is unpowered on initialisation, but also if it is disconnected and re-connected during runtime
I got the following error when I tried to launch sensor.launch. Any solution yet? Thanks in advance
[FATAL] [1693665521.636908912]: Failed to load nodelet '/ouster/os_driverof type
ouster_ros/OusterDriverto manager
os_nodelet_mgr'
[ouster/os_driver-3] process has died [pid 11682, exit code 255, cmd /opt/ros/noetic/lib/nodelet/nodelet load ouster_ros/OusterDriver os_nodelet_mgr __name:=os_driver __log:=/home/Kreypo/.ros/log/64b883dc-499e-11ee-8e40-b1b3f24513bb/ouster-os_driver-3.log].
@LinusNEP I don't think what you are reporting is related to this thread (which is about trying to connect while sensor is not connected or up yet), I would suggest that you first lookup all other issues first for the same issue you are getting, and if none match then start a new issue. also it would be better to include the rest of the log because what you have there is not sufficient.