Exposing computed boundingBox information
What is the right level of detail to expose bounding box information?
Fundamentally, an Axis-Aligned Bounding Box (AABB) only requires six values, e.g.:
min(x,y,z) and max(x,y,z)
OR
center(x,y,z) and size(x,y,z)
In practice, most 3D engines expose some level of convenience over operations on AABB objects, e.g. to determine broad-phase intersections, adding and subtracting, or factoring their dimensions into matrix operations for transformation.
As well as that, there are questions of the lifetime of the objects created for this purpose - the timeframe for which objects are valid, when / if they're nulled, or whether things are resolved as being pass-by-value valid (i.e, they're re-creations that are only valid at the time of creation etc)
It seems like in the longer term, a comprehensive spatial web will need a more sophisticated set of 3D math objects, for relating DOMMatrix to DOMPoint, DOMRect etc. - but that seems like a big project that will require clearer use-cases and some initial, basic functionality for people to explore with.
Suggestions
- Expose
DOMPoint model.boundingBoxCenterandDOMPoint boundingBoxExtentdirectly on the model element - Expose a
boundingBoxas a key-value Dictionary so we seeDOMPoint model.boundingbox.centerDOMPoint model.boundingbox.minDOMPoint model.boundingbox.max...but are unable to returnradiusor other convenience methods - Do the up-front work of creating an AABB interface that allows us to expose some selection of values and provides us with a context to supply convenience methods (e.g.
intersect,volume,radiusetc)
Personally, I think that while (3) is an inevitability, I'd actually lean on the simplest, roughest approach that gives early authors the bare minimum to work with, i.e. (1). I'm aware that can carry the risk of never being able to deprecate it, but I suspect we simply don't know enough about what we want to do the architecture up-front.
A couple of thoughts:
- My experience has said that it's frequently more convenient to have a center+size than min+max. I do expect that you'll see various use cases that end up favoring both, though, so I don't think you'll be able to make a "correct" choice here.
- Re: "Expose a boundingBox as a key-value Dictionary", it's been made pretty clear to me in the past that Web APIs should consume dictionaries but never return them. If we want to return a bundle of attributes then an interface is the webby way to do that.
- If we do define an interface, then I strongly advocate for exposing center, size, min, AND max because they can be implemented as lazy attributes that are calculated from whatever the favored internal representation is. (WebXR does this with rigid transform matrices)
- That said, I would start with exposing a couple of attributes on the model element and only upgrade it to an interface if we see a strong reason for it. Thankfully the math required for most of the convenience methods you mentioned isn't very complicated.
/tpac
It seemed like the consensus was that
- this is useful information,
- the
DOMPoint-based notion of a center and the extent should be adequate, - It's acceptable not to continually recompute this, and to ignore shader-like displacement-mapped offsets.
The only remaining question is whether the computation is based on the first frame of the first animation, or on the "bind pose" of an asset. My preference would be first-frame, first-animation, since the bind pose of an animated asset is never actually displayed under normal circumstances.
@toji, do you have a strong preference for bind? I'm thinking that relying on 'first animation is default animation' has the ability to solve a few problems for us, but I may be missing something.
I don't have a strong preference for it, but there's a couple of considerations that make me lean towards it.
From an efficiency standpoint the bind pose bounds are going to be easier to compute, since you don't have to skin or blend the mesh to get them. Also, a format like glTF encodes the min and max vertex positions for each buffer in it's structure, which means you don't even have to examine the buffer contents to get a reasonable estimate. (You do still need to at least apply basic node transforms to get a bounds estimate, and if you want the bounds to be exactly accurate you do need to look at individual vertices, so it's not a magic bullet.)
There's also the opportunity with glTF to make use of extensions like FB_geometry_metadata, which IS specified to be a vertex-perfect bounds but it's also specified to be against the bind pose ("should be computed with disregard for animation: all morph targets are inactive, all node transforms are in their static form and unaffected by any animation")
I do not know what sort of bounds info USD files may contain.
Also, I do feel like there's some opportunity for the first frame to be misleading. Take this model for example: https://sketchfab.com/3d-models/jack-in-the-box-e91c8bcb0a904bf6b8ba1db2027682de
If the bounding box was computed from the first frame then it would frame only the box, and the (extremely creepy) puppet would go out-of-frame when it opens. Computing the bounding box based on the bind pose in no way guarantees that the full animation would be in frame, but it does give artists a way to influence it without modifying the animation with a single frame of extended geometry or similar.
Ultimately I don't think any of those are deal-breakers, and the most important part is that the tag's behavior is well specced and predictable for page authors. I think they're worth considering, though!
I think that makes sense to me, I'd be happy to go with bind pose - I'll update the explainer at least, and then we can figure where else it needs to go.
My main concern with bind pose is that it’s not hard to make a model where the model never appears because the model was designed and bound in a different location to where the animation happens.
I think it’s worth the extra work so that by default you will see something and if the animation leaves the bounding box it’s not hard to adjust it with the entity transform but at least you will know where it is.
I’m not going to die on this hill but I do feel it’s the more user friendly option.