HEBI-ROS-DEPRECATED icon indicating copy to clipboard operation
HEBI-ROS-DEPRECATED copied to clipboard

Gazebo Plugin: Spawning URDF in Gazebo with large # of HEBI Actuators causes hebiros_node failure or Gazebo Plugin memory error (race condition?)

Open josephcoombe opened this issue 6 years ago • 2 comments

Using HEBI-ROS (v.1.3.0).

Encountered some issues simulating a fairly large HEBI Hexapod URDF with 16 HEBI Actuators in Gazebo.

I've been able to generate 4 outcomes pretty consistently (on my machine):

  1. hebiros_node fails silently after creating group a) Launch Gazebo/spawn Hexapod URDF b) Start simulation c) Launch hebiros_node 5+ seconds later d) Launch my_node / call AddGroupFromNamesSrv service :x: : hebiros_node fails silently after creating group
joseph@joseph-Aspire-VN7-591G:~$ roslaunch hebiros hebiros_gazebo.launch 
... logging to /home/joseph/.ros/log/6424bbc2-32cd-11e8-ad3d-3065ec753340/roslaunch-joseph-Aspire-VN7-591G-1217.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
WARNING: disk usage in log directory [/home/joseph/.ros/log] is over 1GB.
It's recommended that you use the 'rosclean' command.

started roslaunch server http://joseph-Aspire-VN7-591G:42407/

SUMMARY
========

PARAMETERS
 * /rosdistro: kinetic
 * /rosversion: 1.12.13

NODES
  /
    hebiros_node (hebiros/hebiros_node)

ROS_MASTER_URI=http://localhost:11311

process[hebiros_node-1]: started with pid [1244]
Broadcasting on:
  192.168.1.255
[ INFO] [1522271853.884137512]: Using Gazebo
[ INFO] [1522271853.889273936]: Parameters:
[ INFO] [1522271853.889300673]: /hebiros/node_frequency=200
[ INFO] [1522271853.889320614]: /hebiros/action_frequency=200
[ INFO] [1522271853.889337735]: /hebiros/feedback_frequency=100
[ INFO] [1522271853.889352443]: /hebiros/command_lifetime=100
[ INFO] [1522271857.693422949, 3.986000000]: Created group [hexapod]:
[ INFO] [1522271857.693461425, 3.986000000]: /hexapod/Leg1/Hip
[ INFO] [1522271857.696260478, 3.987000000]: /hexapod/Leg1/Knee
[ INFO] [1522271857.699193837, 3.988000000]: /hexapod/Leg1/Ankle
[ INFO] [1522271857.701563500, 3.989000000]: /hexapod/Leg2/Hip
[ INFO] [1522271857.704214415, 3.990000000]: /hexapod/Leg2/Knee
[ INFO] [1522271857.706693322, 3.991000000]: /hexapod/Leg2/Ankle
[ INFO] [1522271857.708986570, 3.992000000]: /hexapod/Leg3/Hip
[ INFO] [1522271857.711288946, 3.993000000]: /hexapod/Leg3/Knee
[ INFO] [1522271857.714586326, 3.994000000]: /hexapod/Leg3/Ankle
[ INFO] [1522271857.717103043, 3.995000000]: /hexapod/Leg4/Hip
[ INFO] [1522271857.719666942, 3.996000000]: /hexapod/Leg4/Knee
[ INFO] [1522271857.722359279, 3.997000000]: /hexapod/Leg4/Ankle
[ INFO] [1522271857.724945592, 3.998000000]: /hexapod/Leg5/Hip
[ INFO] [1522271857.727315932, 3.999000000]: /hexapod/Leg5/Knee
[ INFO] [1522271857.729805885, 4.000000000]: /hexapod/Leg5/Ankle
[ INFO] [1522271857.732199354, 4.001000000]: /hexapod/Leg6/Hip
[ INFO] [1522271857.734937289, 4.002000000]: /hexapod/Leg6/Knee
[ INFO] [1522271857.737532682, 4.003000000]: /hexapod/Leg6/Ankle
[hebiros_node-1] process has died [pid 1244, exit code -11, cmd /home/joseph/tto-robotics/behavior_library_catkin_ws/catkin_ws/devel/lib/hebiros/hebiros_node -use_gazebo true __name:=hebiros_node __log:=/home/joseph/.ros/log/6424bbc2-32cd-11e8-ad3d-3065ec753340/hebiros_node-1.log].
log file: /home/joseph/.ros/log/6424bbc2-32cd-11e8-ad3d-3065ec753340/hebiros_node-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
done
  1. hebiros_node doesn't crash and successfully handles AddGroupFromNamesSrv service call a) Launch Gazebo/spawn Hexapod URDF b) Launch hebiros_node c) Start simulation d) Launch my_node / call AddGroupFromNamesSrv service :heavy_check_mark: Everything seems to work! I can even close my_node and restart it (calling AddGroupFromNamesSrv gain) w/o any problems or a) Launch Gazebo/spawn Hexapod URDF b) Start simulation c) Launch hebiros_node immediately after step b) (~ 2-3 seconds later) d) Launch my_node / call AddGroupFromNamesSrv service :heavy_check_mark: Everything seems to work! I can even close my_node and restart it (calling AddGroupFromNamesSrv gain) w/o any problems

  2. hebiros_node doesn't crash but handles AddGroupFromNamesSrv call incorrectly a) ??? (I can't consistently reproduce this one) b) Launch my_node / call AddGroupFromNamesSrv service :x: the hebiros_node won't crash and will appear to handle the AddGroupFromNameSrv service call correctly but will only publish joint_state feedback for 1-2 of the HEBI Actuators instead of the entire set of 16. :heavy_check_mark: If I close my_node and restart it, the hebiros_node will handle the AddGroupFromNamesSrv service call successfully

  3. Gazebo Plugin crashes with memory error a) ??? (I can't consistently reproduce this one) :x: Gazebo Plugin will crash with a memory error and dump the following to the terminal hebiros_memory_error.zip

josephcoombe avatar Mar 28 '18 21:03 josephcoombe

An additional note: Everything usually works if I launch everything (gazebo_ros, hebiros, nodes, etc) as part of a single .launch file (though it's worth noting that roslaunch by design makes no guarantees regarding launch order or timing).

I have a .launch file that will load everything successfully when my laptop is plugged into a power supply, but the hebiros node / Gazebo Plugin will crash when my laptop is running off battery, confirming suspicions that at least part of the problem is in fact a race condition.

josephcoombe avatar Apr 23 '18 18:04 josephcoombe

I saw similar issue upgrading to Melodic running x-demo Seems to be fixed by initializing node handle in plugin loader. Early in validation stage. https://github.com/CNURobotics/HEBI-ROS/tree/melodic-devel

dcconner avatar Feb 06 '20 14:02 dcconner