Nabla icon indicating copy to clipboard operation
Nabla copied to clipboard

Different InverseBindPose Per Drawable [Also only recompute skinning matrices for Skins&Skeleton Instances of Visible Drawable Instances]

Open devshgraphicsprogramming opened this issue 4 years ago • 0 comments

Description

As it stands right now, the user must preallocate and provide the Bone Translation Table offset via either per-object or per-view-per-object data to the skinning compute or vertex shader.

It is therefore impossible to have a different BTT per LoD or Drawable without storing a variable length list in the per-object data.

Description of the related problem

There is also some choice whether to make the skinning matrices transient or not.

If compute shader skinning is used, we have no use for persistent skinning matrices (and therefore the BTT). However if using vertex shader skinning, we can potentially profit from not having the BTT entries get recomputed every frame. (allowing us to run the BTT update compute shader at a different frequency to rendering).

Solution proposal

One needs to perform approximate duplicate elimination (its not acceptable to falsely detect a duplicate, but it is acceptable to not detect a duplicate) of <SkeletonInstanceID,BindPoseID> of the draw instances that pass the culling stage, then output a contiguous array of entries that allow us to identify the Translation Table range assigned to each unique entry.

Both approaches require that another uint per-instance-divisor vertex attribute is added as the output of the instance counting sort of the LoD and Culling System.

Method 1: Explicit Bitflag array (requires <SkeletonInstanceID,BindPoseID> can be turned into a GUID less than 30 or 34 bits)

Use atomicOr to flag which tells us if an entry for a visible skinning matrix translation table has been assigned yet.

If this invocation is the first one to set the bit, we increment an atomic counter of unique Translation Tables to recompute and append to the list:

  • if using transient Translation Tables, just the GUID
  • otherwise GUID,BTTM offset

After [re]computing the skinning matrices [if transient, we now need to put out the GlobalInvocationIndex as the BBTM offset to accompany the GUID], sort the list by the GUID.

When the drawcalls are bucket sorted into the per-instance data, recoup the GUID from per-object and drawable data, then use binary search lower_bound to retrieve the BTTM offset.

Method 2: GPU Hashmap

Can be used both for Visibility Feeback transient BTT and plain allocating persistent BTT entries (although we'd need to use a hashmap which does not degrade perf after many deletions).

We simply use a GPU friendly fixed size hashmap implementation with <SkeletonInstanceID,BindPoseID> as the key and the offset to the BTT skinning matrices as the value.

In the case of transient BTT, allocating contiguous ranges for bone matrices is trivial (GPU side Linear/Bump Allocator, a.k.a. atomic counter).

However for persistent BTT allocated on the CPU, the hashmap would need to be readonly from the GPU and kept in sync. On the flipside this could allow us to use a specialized GPU hashmap implementation where we only care about GPU lookup performance (perfect hashing?).

Special care would need to be paid to ensuring contiguous value elements

When the drawcalls are bucket sorted into the per-instance data, just look up the hashmap value.

Additional context

As an added bonus, we'd also like to be able to only update the skinning matrices of bones which have both moved and are currently visible/needed (this would be determined with animation independent AABBs). We can do this in broadphase and narrowphase.

Broadphase: Visibility Feedback

Only visible drawable instances tag their bone instances for recomputation

Narrowphase: Timestamp check

Only the tagged bone instances are processed by compute invocations, and then a recomputation timestamp is checked and compared with the skeleton instance bone's global transform recompute timestamp.