dart icon indicating copy to clipboard operation
dart copied to clipboard

Dantzig LCP solver nan values

Open pchorak opened this issue 7 years ago • 15 comments

Hello, I have come across an issue while using DART with Gazebo in which the Dantzig LCP solver returns nans, causing the simulation to fail. I am using the Gazebo default branch and DART master branch. Here is a Gazebo world file (sdf) to reproduce the issue. In the example, two boxes anchored to the world with parallel revolute joints collide. matrices.txt contains the debug printout from DantzigLCPSolver..cpp. If I replace the boxes with spheres, do not align the revolute axes, or use the projected Gauss-Seidel solver then the issue does not occur.

pchorak avatar Jul 14 '17 18:07 pchorak

Here is an example using DART without Gazebo: collideBoxes.zip

pchorak avatar Jul 14 '17 19:07 pchorak

I see that pull request #881 relates to contact constraints and nan values as well. I checked the normals of the of the contacts at this point. They all appear to be of approximately length 1. There are a number of contact constraints (10) and they point in many different directions.

pchorak avatar Jul 15 '17 00:07 pchorak

Digging further into DantzigLCPSolver.cpp, it appears that the ODE solver dSolveLCP is being used to handle a system of linear equalities Ax=b with bounds as opposed to an LCP. The issue occurs when the system does not have a solution. The matrix A is singular (30x30 with rank 11) but almost has a solution because of linearly dependent equations. Using the pseudo-inverse (ignoring bounds for now) gives a solution with error of 0.001 in a few elements, similar to the PGS solution.

While the contact normals look ok, some of the contact locations are causing the inconsistent equations. The GIF below shows the contacts visualized in Gazebo and the simulation failure.

collision_no_filter

If I apply the diff below to ignore the invalid contact with the lower z position, the simulation continues for more steps. However, the other contacts begin to drift and the simulation fails eventually as shown below.

diff --git a/dart/constraint/ConstraintSolver.cpp b/dart/constraint/ConstraintSolver.cpp
index 0407973..e0877b0 100644
--- a/dart/constraint/ConstraintSolver.cpp
+++ b/dart/constraint/ConstraintSolver.cpp
@@ -426,7 +426,8 @@ DART_SUPPRESS_DEPRECATED_END
   {
     contactConstraint->update();
 
-    if (contactConstraint->isActive())
+    if (contactConstraint->isActive() &&
+        (contactConstraint->mContacts[0]->point(2) < 0.9))
       mActiveConstraints.push_back(contactConstraint);
   }

collision_with_filter

pchorak avatar Jul 18 '17 16:07 pchorak

Thank you for investigating this issue. I'm currently on vacation so might not be able to take a look at this for the next few weeks. However, any update would be much appreciated.

jslee02 avatar Jul 18 '17 17:07 jslee02

I can help look into this. Enjoy your vacation, JS!

mxgrey avatar Jul 18 '17 17:07 mxgrey

I see two ways to keep the simulation from failing.

  1. Use a more robust solver such as PGS that returns an approximate solution when given inconsistent equations. LP solvers may also be an option since the LCP capabilities of dSolveLCP appear unused.

  2. Fix the contact point generation.

Option 2 may be preferable since invalid contacts points are not good. However, option 1 could be more general if inconsistent equations crop up in other contact patterns.

pchorak avatar Jul 18 '17 17:07 pchorak

@scpeters FYI I spent some more time investigating this issue. Originally I have had the same thoughts with @pchorak on the two potential root causes of this issue: either collision detection return invalid contacts, which results in an invalid problem; or the solver is not robust enough to solve this problem

From the gif that Peter posted, it is already clear that there is contact within the body itself, which would fail the simulation if and only if the solver returns a contact force close to 0 at this invalid contact. For PGS, I suppose most of the case, initial iteration, or initial guess is 0, which is actually helpful in this case.

Now after investigation, we should change the title of this issue, because this is a collision detection issue, not solver. I tried with the following combinations: (1) ODE collision detection + quick_step (a modified PGS, different from the PGS in DART? ) (2) ODE collision detection + world step (Dantzig) (3) DART collision detection (FCL) + PGS (4) DART collision detection (FCL) + Dantzig

The solvers in case (2) and (4) are exactly the same since DART uses the implementation from ODE as well. Here are the simulation results: (1) 2 contacts at the 2 vertexes of the two boxes, simulation runs smoothly for minutes, no crash (2) 2 contacts at the 2 vertexes of the two boxes, simulation runs smoothly for minutes, no crash (3) multiple contacts, 1 of them is inside the box, the simulation survives even this invalid contact exists (I suppose the solver returns force of 0 at this invalid contact, can be easily verified) (4) multiple contacts, 1 of them is always inside the box, as shown in the gif, and simulation crashes.

Therefore, (2) and (4) are sufficient proof that the PCL returned invalid contacts, which results in the NaN in the Dantzig solution. I think it would not worth the effort to fix the solver at this time. But rather improve the FCL.

rosebudflyaway avatar Aug 26 '17 00:08 rosebudflyaway

@scpeters @pchorak This further motivates the ignition libraries and flexible switches between collision detection, solver, etc., so that we can identify which part caused the issue quickly :+1:

rosebudflyaway avatar Aug 26 '17 00:08 rosebudflyaway

It does sound like the problem is coming from FCL, but it also seems that the Dantzig solver is less robust to bad inputs. Is there a way to detect when invalid contact points are given? Maybe we could compute the signed distance from a contact point to each collision shape? It could be a useful utility for debugging collision detection issues.

FYI @JenniferBuehler this issue may be interesting to you

scpeters avatar Aug 28 '17 17:08 scpeters

This is an interesting issue. I will only have more time to help look into this in October, but from what you are writing above I'd also say that it would make sense to fix the contact points as Peter suggests and as Ying has shown in the test (good proof btw). However making the solver more robust (at the very least with assertions to catch critical inputs) would be helpful as well. IMO generated NaN values should always be caught somewhere (ideally as close to the cause as possible), before they cause follow-up issues.

If this is not already solved by October when I will get back to work, I can help look into this as well.

JenniferBuehler avatar Sep 05 '17 08:09 JenniferBuehler

Hi, I'm also running into the problem of nonsensical contact points using the Bullet collision detector. Situation is: two cylinders colliding with each other at 90 degrees (in the shape of an X); should return one contact point roughly in the middle of the X. Most of the time this goes OK, but eventually Bullet collision detector returns one or more points that are outside the intersection of the AABBs of the colliding bodies. So clearly nonsensical. My codebase is fairly big so it's hard to make a minimal working example, but I can try if this would help.

First idea is to compute the AABB intersection between the two colliding bodies, and cull all contact points that lie outside it. But clearly it's just a workaround and not really a solution.

Phenomenon occurs with Bullet collision margin of 0 as well as 1E-3, 1E-6.

Any ideas how this might happen and how we can address it properly?

mentisn4b avatar Nov 08 '17 11:11 mentisn4b

I've had that off-shape contact point with bullet in the collision_benchmark tests. It would be next on my TODO list to start debugging the tests which fail with the collision_benchmark. If I find anything I will comment on it here.

JenniferBuehler avatar Nov 08 '17 13:11 JenniferBuehler

Hi, some more observations. I printed all collision shapes and their transformations after getting the collision result back from Bullet, and they all work out, so the problem is not there. (Anyway, the code that interfaces DART to the Bullet collision detector is pretty small and simple.)

The Bullet collision detector is very sensitive though; there are several epsilons defined in Bullet's GJK code. Earlier I had decreased some of these margins; recompiling with a fresh clone of the Bullet repository leads to the error sometimes not occurring, sometimes later in the simulation.

Some more insight from Erwin Coumans (also note: GJK penetration depth is only valid for small penetrations, i.e. less than the margins):

  • http://www.bulletphysics.org/Bullet/phpBB3/viewtopic.php?f=9&t=591
  • https://github.com/bulletphysics/bullet3/issues/888

So, it might not be a bug per se, just a shortcoming of the GJK/EPA algorithms? (Any experts on those subjects?)

mentisn4b avatar Nov 09 '17 14:11 mentisn4b

It seems the root causes are: (1) FCL returns contactacts that leads to infeasible constraint problem (2) Dantiz is sensitive to infeasible constraint problem

(1) may be the same with other collision detectors as well. What FCL makes the behavior worse is that it returns more contact points.

There are two possible solutions on top of my head now.

(a) Let the Dantzig solver falls back to using PGS solver when there is no solution, which can be detected by observing the s term in the implementation. (b) Introduce contact manifold, which is the same concept of Bullet, to manage the contact points between two colliding objects. This might bring two positive effects to us: (i) limiting the maximum number of contacts so less computational load to constraint solver and (ii) maintaining the best contact points by discarding bad contact points.

In addition to that, (b) will allow us to use the convexity based collision algorithms for primitive shapes that is more efficient than using meshes. The main reason we don't use the FCL's primitive shapes (that uses the convexity based collision algorithms) is that they only return a single contact at a time. Contact manifold will aggregate contact pointes over time so that they can be used in dynamic simulation.

jslee02 avatar Mar 22 '19 22:03 jslee02

#1265 is a partial solution to this issue.

I'll leave this issue open because I think the Dantzig solver shouldn't return true when the solution includes NAN values. Also, we need to investigate why FCL returns contact points in the middle of the boxes.

jslee02 avatar Mar 23 '19 22:03 jslee02