bepuphysics2 icon indicating copy to clipboard operation
bepuphysics2 copied to clipboard

Tree batch insertion

Open RossNordby opened this issue 6 years ago • 7 comments

  • Broad phase incremental insertion makes up the majority of sleep/wake cost.
  • Incremental insertion is sequential in nature and destroys effective load balancing.
  • Incremental insertion tends to build pretty crappy trees.

You can do quite a bit better by exploiting the fact that islands are almost always composed of bodies that are close to each other. For example, throw a sweep builder at each island before adding the whole subtree at once. Despite being a much higher quality builder, avoiding the need to traverse the upper tree N times for incremental insertions would likely make it faster.

Different islands can be built in parallel trivially, and larger islands could be internally multithreaded as well. This would likely remove broad phase manipulation as a bottleneck in activity management.

Subtree builders could even be made to realize that a given island would be best represented in multiple nodes in terms of SAH, but I doubt that this would be necessary in practice. Refinement would deal with such inefficiencies pretty quickly, and it would almost never be worse than incremental insertion.

This should be treated as depending on #6. The additional complexity associated with scheduling island constructions may be enough to warrant a more cohesive job system as well.

RossNordby avatar Apr 01 '18 00:04 RossNordby

As a potential example of incremental insertion related problems, consider what happens on the PyramidDemo when a few thousand boxes get knocked into the void, and the remaining boxes deactivate.

Upon reactivation of a large number of boxes remaining on the original surface, active tree quality drops precipitously (as expected for incremental insertion), but then incremental refinement has trouble recovering due to the extreme circumstances. This is a good test case for any attempt to revisit reactivation.

RossNordby avatar Jul 06 '18 00:07 RossNordby

On further investigation, the pyramid demo issue was actually caused by a rare race condition during waking. Spewed some NaNs, and NaN-sized bounding boxes ruin the tree pretty quickly.

With that fixed, the only real concern is the actual cost of insertion.

RossNordby avatar Jan 20 '19 00:01 RossNordby

How do you even get a NaN-sized box ?!? Is this some crazy 4+th dimension stuff (Hamilton Quaternion territory)? ;)

ghost avatar Jan 20 '19 00:01 ghost

On a more serious note: How did there get to be NaN bounding boxes anyway? You mention a race condition on waking. Is this due to the values not being properly set before they are accessed by the tree ?

ghost avatar Jan 20 '19 00:01 ghost

It was pretty sneaky, which was part of the reason why it was around for several months. Apparently, while slightly tired, I scheduled the waker-upper constraint data copy alongside new contact constraint additions. In almost all cases, this is fine- the copies operate on memory reserved ahead of time, so the new constraints won't touch the same memory.

But... if a new constraint ends up triggering a resize of a type batch at the same time as another thread is copying newly-awoken constraint data into that same type batch, the copies may end up sending their data into a dead buffer, leaving the resized buffer without the updated data.

Then the solver comes along, slurps up that uninitialized constraint data, and vomits NaNs everywhere. Pose integrator sees a bunch of NaN velocities, nods, and dutifully makes all the related bounding boxes NaN-sized. Broad phase takes this and does its best, but it causes horrible problems and a crash within a frame or two 90% of the time.

In the rare cases where it doesn't crash, the collision testing shows up in profiling as taking way too long due to the NaN infection. I happened to see one of those cases and just assumed it was related to some kind of insertion quality bug until I managed to reproduce the associated crashes. 🌈

RossNordby avatar Jan 20 '19 00:01 RossNordby

...if a new constraint ends up triggering a resize of a type batch at the same time as another thread is copying newly-awoken constraint data into that same type batch, the copies may end up sending their data into a dead buffer, leaving the resized buffer without the updated data.

Well crap ... that's a serious edge case that doesn't really come out until the timing is just right. Glad it was found though! It'd be a nightmare for some of us to figure out because it makes it sound (with the timing of the resizing) like it would be completely random occurrence.

ghost avatar Jan 20 '19 04:01 ghost

I also like the "NaN infection"

** We Are the NaN. You will be assimilated. Resistance is futile! **

ghost avatar Jan 20 '19 04:01 ghost