OpenUSD icon indicating copy to clipboard operation
OpenUSD copied to clipboard

Apple: Handle NaN and Inf timesample situations

Open dgovil opened this issue 9 months ago • 16 comments

Description of Change(s)

There are certain situations where USD encounters a NaN or Inf time or time interval when populating the scene values, causing a crash.

This PR simply checks for the values being finite before proceeding.

Fixes Issue(s)

Checklist

dgovil avatar May 16 '25 01:05 dgovil

Filed as internal issue #USD-11009

(This is an automated message. See here for more information.)

jesschimein avatar May 16 '25 18:05 jesschimein

@spiffmon After talking to @meshula about this, I think this particular case is USD equivalent to undefined behaviour. I can put up PRs to fix the divide by zero etc issues, but I think it would be great if we could define the behaviour in this case.

There are two aspects to this that I can see which would need defining:

  1. For scene playback, what does timecodesPerSecond=0 mean? IMHO it should mean the scene is static
  2. For applying timecode translations to incoming layers, I'm not sure what makes sense. Should it be equivalent to "do not transform the values"? Or should it also collapse the values to a single default?

dgovil avatar May 23 '25 19:05 dgovil

I presume the authoring that this PR from Dhruv's perspective might have been a fuzzing or validation exercise, rather than the output of an artist facing tool, and it seems to have exposed some UB for sure. By UB we mean that in-code documentation that says if tcps is unauthored, use fps. If fps is unauthored, use 24fps. However, in the case of explicitly authoring 0 fps what does it mean? Presumably it could mean time cannot advance, but divide by zero is not checked for, as this PR highlights. If time can't advance, what does that mean? That sampled values at a particular time exist but dt derivatives do not? It irks me to fall back to UB if we can say what we expect to happen for authoring this.

meshula avatar May 23 '25 21:05 meshula

For scene playback, what does timecodesPerSecond=0 mean? IMHO it should mean the scene is static

Curiously, the sdf data model constrains framesPerSecond to be strictly positive. Would it make sense to extend the same constraint to timeCodesPerSecond?

https://github.com/PixarAnimationStudios/OpenUSD/blob/53d7d647d293fe1aaeb03d2186e8cdeefe09a6fd/pxr/usd/sdf/schema.cpp#L312

nvmkuruc avatar May 23 '25 21:05 nvmkuruc

Even more curiously, and contra-indicatory (is that a word?), is this code for handling zero in SdfLayerOffset...

spiffmon avatar May 23 '25 22:05 spiffmon

So... given that we also do not prevent zero-scales in layer offsets, what do folks think about defining the behavior like this:

If the time-mapping from a layer to the stage's time (which should be computed for the root layer as a layerOffset of (scale=1, offset=0), so still infinite when timeCodesPerSecond is zero), then:

  1. timeSamples collapse, and querying at any stage time must equate to querying at UsdTimeCode.EarliestTime()
  2. GetBracketingTimeSamples(), if we would resolve to timeSamples, returns a zero interval at EarliesTime() for the attribute.
  3. Similarly for splines, always evaluate at the first knot.

The practical problem here is #3, as I'm not entirely sure how we practically pull that off; when we return a requested spline to a client (mapping its knot times to stage times), I don't think we can set all knots' times to the same value because knots are (and are expected to be), sorted, and we'd need to ensure the first knot remained first.

Saying that an invalid mapping results in no transformation seems awkward with strange behavior, as some data may be "properly" mapped, while other data is not (because layerOffset can contain zero-scale, for any composed layer).

Curious to hear others' thoughts, and whether we should pursue the issues around spline evaluation.

spiffmon avatar May 25 '25 01:05 spiffmon

timeSamples collapse, and querying at any stage time must equate to querying at UsdTimeCode.EarliestTime()

Isn't it the case that we are trying to model that time never advances? An option is that it could still instantaneously be any value but unable to move otherwise. I think the implicit point you are suggesting is that if time can never be advanced, and the default time is earliest time, then we can never move beyond the default time. But I'm not sure if it makes more sense to say "we can send the time machine to any frozen moment, but it can never budge from the frozen moment", or "the Time Machine can't even budge from zero." If we can send the tm to a frozen moment, maybe it makes the subsequent questions about spline evaluation easier to gauge?

You linked to #3, was that a typo? Maybe the link you mean to point to changes how I'm thinking about it :)

meshula avatar May 26 '25 01:05 meshula

When a zero scale appears in a sublayer’s layerOffset, we don’t have the flexibility to interpret as “time does not advance”, because the composed layer time-mapping will already have collapsed all the layer’s timeSamples to a single point in the forward direction. That’s why I suggested making zero tcps match that behavior.

spiffmon avatar May 26 '25 20:05 spiffmon

@meshula and @spiffmon I forgot to follow up on this with you both in all the other stuff going on. What should we do as behaviour here? I don't think fully grok the consensus above.

dgovil avatar Aug 07 '25 20:08 dgovil

Hey @dgovil , @meshula 's suggestion would allow for more access to the existing data than what you and I were thinking (everything collapses to a single sample), but I really don't think we can pull it off given all our existing API's and what clients would do with them. But I've cued it (and the spline question -- that was the number 3 I was referring to, Nick) up for a team discussion, and will update here after we've talked it over a bit.

spiffmon avatar Aug 12 '25 14:08 spiffmon

Thanks, Spiff.

On Tue, 12 Aug 2025 at 07:13, F. Sebastian (spiff) Grassia < @.***> wrote:

spiffmon left a comment (PixarAnimationStudios/OpenUSD#3634) https://github.com/PixarAnimationStudios/OpenUSD/pull/3634#issuecomment-3179528630

Hey @dgovil https://github.com/dgovil , @meshula https://github.com/meshula 's suggestion would allow for more access to the existing data than what you and I were thinking (everything collapses to a single sample), but I really don't think we can pull it off given all our existing API's and what clients would do with them. But I've cued it (and the spline question -- that was the number 3 I was referring to, Nick) up for a team discussion, and will update here after we've talked it over a bit.

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenUSD/pull/3634#issuecomment-3179528630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB4XUUDVJXKUHCUGFDZBK33NHZABAVCNFSM6AAAAAB5HQEY3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZZGUZDQNRTGA . You are receiving this because you were mentioned.Message ID: @.***>

dgovil avatar Aug 12 '25 15:08 dgovil

Hey @dgovil - we were unable to think of a useful, meaningful need for either a layerOffset-scale of zero or a timeCodesPerSecond of zero, so we propose simplifying the whole shebang by defining and validating that both must be positive and non-zero. While this will require a small update to the (future?) Core Spec, we believe it will be easy to deploy, and give clients protection and unambiguous behavior.

Noting that we just updated the prim indexing algorithm to outlaw negative offsets, we propose to just modify that check, and add one for layer/subLayer timeCodesPerSecond, so that when there is invalid data, we get a composition error, and that input is forced to its fallback (which for timeCodesPerSecond is 24).

The good news is that both checks would logically happen in PcpLayerStack::_BuildLayerStack(), which is where the existing non-negative-scale check is happening. Have a look and let us know if you have any questions or concerns?

Thanks!

spiffmon avatar Aug 20 '25 23:08 spiffmon

Perfect. That makes sense to me.

I'll try and update this PR soon to fit that requirement.

For core spec, I'll just flag @asluk so he's aware. Possibly if we get any notes, we can just put it in 1.0.

dgovil avatar Aug 21 '25 00:08 dgovil

@spiffmon I updated layerStack.cpp to check and set the time code as you asked. However it's still crashing somewhere else along the way, and I can't seem to figure out where that might be.

Are there any other places you know of where it might be getting the 0 value? The crash is in a very different spot so I can't find it by walking up the call stack.

In the meantime, I've left my existing stage check in place because I think its still a good check to have regardless of us correcting the layer time code per second.

dgovil avatar Oct 09 '25 19:10 dgovil

Just in case it might trigger something over here, could you attach/provide the crashing callstack?

--spiff

On Thu, Oct 9, 2025 at 12:14 PM Dhruv Govil @.***> wrote:

dgovil left a comment (PixarAnimationStudios/OpenUSD#3634) https://github.com/PixarAnimationStudios/OpenUSD/pull/3634#issuecomment-3387186455

@spiffmon https://github.com/spiffmon I updated layerStack.cpp to check and set the time code as you asked. However it's still crashing somewhere else along the way, and I can't seem to figure out where that might be.

Are there any other places you know of where it might be getting the 0 value? The crash is in a very different spot so I can't find it by walking up the call stack.

In the meantime, I've left my existing stage check in place because I think its still a good check to have regardless of us correcting the layer time code per second.

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenUSD/pull/3634#issuecomment-3387186455, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPOU2GKQFWH6QNXGPVTXQT3W2XYPAVCNFSM6AAAAAB5HQEY3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGOBXGE4DMNBVGU . You are receiving this because you were mentioned.Message ID: @.***>

spiffmon avatar Oct 09 '25 23:10 spiffmon

This call stack requires commenting out the !stageToLayer.IsValid() check in stage.cpp (since that prevents a crash altogether) if using this branch. Alternatively run it in a vanilla OpenUSD build.

[Inlined] std::__tree_min[abi:ne200100]<…>(std::__tree_node_base<…> *) __tree:167
[Inlined] pxrInternal_v0_25_11__pxrReserved__::Usd_CopyTimeSamplesInInterval(const std::set<…> &, const pxrInternal_v0_25_11__pxrReserved__::GfInterval &, std::vector<…> *) valueUtils.h:212
pxrInternal_v0_25_11__pxrReserved__::UsdStage::_GetTimeSamplesInIntervalFromResolveInfo(const pxrInternal_v0_25_11__pxrReserved__::UsdResolveInfo &, const pxrInternal_v0_25_11__pxrReserved__::UsdAttribute &, const pxrInternal_v0_25_11__pxrReserved__::GfInterval &, std::vector<…> *) const stage.cpp:9451
pxrInternal_v0_25_11__pxrReserved__::UsdAttributeQuery::GetTimeSamples(std::vector<…> *) const attributeQuery.cpp:167
pxrInternal_v0_25_11__pxrReserved__::Usd_FlattenAccess::MakeTimeSampleMapForFlatten(const pxrInternal_v0_25_11__pxrReserved__::UsdAttribute &, const pxrInternal_v0_25_11__pxrReserved__::SdfLayerOffset &, std::map<…> *) stage.cpp:5493
pxrInternal_v0_25_11__pxrReserved__::_CopyProperty(const pxrInternal_v0_25_11__pxrReserved__::UsdProperty &, const pxrInternal_v0_25_11__pxrReserved__::SdfHandle<…> &, const pxrInternal_v0_25_11__pxrReserved__::TfToken &, const std::map<…> &, const pxrInternal_v0_25_11__pxrReserved__::SdfLayerOffset &) stage.cpp:5661
pxrInternal_v0_25_11__pxrReserved__::_CopyPrim(const pxrInternal_v0_25_11__pxrReserved__::UsdPrim &, const pxrInternal_v0_25_11__pxrReserved__::TfWeakPtr<…> &, const pxrInternal_v0_25_11__pxrReserved__::SdfPath &, const std::map<…> &) stage.cpp:5764
pxrInternal_v0_25_11__pxrReserved__::UsdStage::Flatten(bool) const stage.cpp:5907
pxrInternal_v0_25_11__pxrReserved__::UsdStage::ExportToString(std::string *, bool) const stage.cpp:5867
[Inlined] UsdCat(const Args &) usdcat.cpp:280
main usdcat.cpp:309

dgovil avatar Oct 10 '25 01:10 dgovil