spartan Iiwa robot is clicking during movement (indicative that the driver is stopping / relaunching)

(@manuelli @gizatt @YunzhuLi @kmuhlrad )

Rules until this is fixed:

No fast moving the robot -- if the driver dies during a long planned trajectory, this could cause erratic behavior
Be extra watchful and ready with e stop

Not sure where at moment to start debugging. Ideas:

Get Kuka 2 going with Kuka 1 computer (bisect the cause of issue)
Increase priority of driver for the scheduler

Feb 02 '18 00:02 peteflorence

We've seen this behavior when there's too long a gap between commands being sent to FRI. FRI decides that your connection is bad and kicks you out. @sammy-tri may have more insights on how we've dealt with this, but increasing the priority of the driver is probably a good idea.

On Thu, Feb 1, 2018 at 7:55 PM Pete Florence [email protected] wrote:

(@manuelli https://github.com/manuelli @gizatt https://github.com/gizatt @YunzhuLi https://github.com/yunzhuli @kmuhlrad https://github.com/kmuhlrad )

Rules until this is fixed:

No fast moving the robot -- if the driver dies during a long planned trajectory, this could

Be extra watchful and ready with e stop

Not sure where at moment to start debugging. Ideas:

Get Kuka 2 going with Kuka 1 computer (bisect the cause of issue)

Increase priority of driver for the scheduler

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RobotLocomotion/spartan/issues/242, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdG0kXXNMtEYJnw3a28EN16I5ctVnoyks5tQlztgaJpZM4R2hUu .

Feb 02 '18 01:02 avalenzu

Setting a priority on the driver software was one piece of it.

I also thought that we had to bump the priority of Docker's network bridger process (which runs in userland), though I'm not seeing a commit for that in https://github.com/sammy-tri/sammy-iiwa-tools/ so I might have misunderstood.

Feb 02 '18 01:02 jwnimmer-tri

Hey @avalenzu and @jwnimmer-tri , thanks for the help here, very nice!!

We can look into increasing priority in scheduler and also adjusting docker network bridger

Looking forward to seeing what @sammy-tri thinks too

Feb 02 '18 01:02 peteflorence

(wow, this wound up longer than I expected)

I think I've got this taken care of in our lab locally, but I've been fighting an issue much like this for the last several days (I was able to ignore it for a long time because we were driving the robot with a 24 (48 virtual) core monster which was also lightly loaded, but the "little" 8 (16 virtual) core machine which is also running perception code has been problematic).

If you're running on the bare host (no docker/container setup), running kuka_driver with real time scheduling seems to basically do the trick (see https://github.com/sammy-tri/sammy-iiwa-tools/blob/master/docker/lcm_pick_and_place.pmd#L14 ). I've considered putting a call to sched_setscheduler directly in the driver, but I haven't yet.

If you're running in a docker container the situation is considerably more complex. When forwarding a port into the container with the -p argument to docker run docker starts up a userspace process docker-proxy which forwards the udp packets into the container. It is not using the RT scheduler. It can be made to do so by running sudo chrt -p -r 20 pid after it starts. It spawns a bunch of threads which also need to have their priority changed individually. Have fun spelunking in /proc/(pid)/task to find them all. The threads all also run as root, so you'll have to sudo to change the priority. I rejected writing a script to do this based on these reasons.

You could write a wrapper script which replaced /usr/bin/docker-proxy and ran the real code under chrt to bump the priority. It would bump all of the docker-proxy instances ever, which isn't really so bad. It would make the system really fragile to any upgrades of the docker package and be an ugly hack which needs to be reapplied, so I never tried it.

What I wound up with instead is this: https://github.com/sammy-tri/sammy-iiwa-tools/blob/master/docker/docker_run_iiwa.py#L110

It's a class in my docker startup script which removes the IP address for the iiwa on the host, creates a custom docker network to give the container access to the entire network interface, and reassigns the host IP there. It works in terms of keeping the robot running without FRI turning on the brakes and kicking you out of command mode. It's got some downsides:

Setting up the container network (and then tearing it down and restoring the host config) is a multi-step process which leaves artifacts around if it's interrupted or done incorrectly. This can be confusing.
docker network connect doesn't allow you to specify a MAC address for the newly created interface inside the container, and thus it will change every time and the ARP cache on the Sunrise cabinet will be stale, so the robot won't talk to you. I worked around that in my procman script with this sadness: https://github.com/sammy-tri/sammy-iiwa-tools/blob/master/docker/rand_obj_picking.pmd#L10
Notifications pop up on the host saying that the network is disconnecting when the container starts (and that it's connecting when the container starts). This is really only an annoyance.

I'd love to find a better option, but unfortunately I haven't thought of it yet. Suggestions/ideas/questions welcome.

Feb 02 '18 16:02 sammy-tri

I've also been running the robots with this patch to drake-iiwa-driver which keeps them from commanding additional motion if we drop out of FRI and then come back. I think it's the right thing to do but I haven't pushed it to master yet since I still have a little bit of doubt: https://github.com/sammy-tri/drake-iiwa-driver/commit/59e5585f582acce2b8f7ad80e2f7e4cd6f41c917

Feb 02 '18 16:02 sammy-tri

https://github.com/RobotLocomotion/drake-iiwa-driver/pull/23 has the change to at least not resume commanding the robot after things have gone wrong. Will merge soon if there are no objections.

Feb 06 '18 17:02 sammy-tri

spartan spartan copied to clipboard

Iiwa robot is clicking during movement (indicative that the driver is stopping / relaunching)

spartan
spartan copied to clipboard