OpenUSD icon indicating copy to clipboard operation
OpenUSD copied to clipboard

BUG: Compute extent with multi-threading may influence fetching material bindings

Open roggiezhang-nv opened this issue 1 year ago • 3 comments

We found a very strange behavior that ComputeExtent along with GetForwardedTargets in a multi-threading way would influence the behavior of getting material binding with API UsdRelationship::GetForwardedTargets. See the changes of this PR about the test to reproduce this. In the test, it's trying to traverse the whole stage with multi-threads. All operations are supposed to be read-only to the stage, and the test should be successful. However, it fails unexpectedly sometimes. Adding TfRegistryManager::GetInstance().SubscribeTo<UsdGeomBoundable>(); to the first line of main() will pass the test, which we still have no idea why.

You need to run the test with --repeat until-fail:1000 to increase the possibility of failure.

roggiezhang-nv avatar Feb 10 '25 12:02 roggiezhang-nv

@nvmkuruc for vis.

roggiezhang-nv avatar Feb 10 '25 12:02 roggiezhang-nv

On Linux, I had to use --repeat-until-fail 10000 to reproduce the failure.

nvmkuruc avatar Feb 10 '25 12:02 nvmkuruc

Filed as internal issue #USD-10666

(This is an automated message. See here for more information.)

jesschimein avatar Feb 10 '25 17:02 jesschimein

Thanks for pointing this out and providing a test case. This one turned out to be a little tricky to hunt down, but the root cause ended up being the same for another mysteriously failing test we had been tracking on our side, a bonus win there.

In some cases, it is possible that a call to TfRegistryManager::SubscribeTo may return before all registry functions are called if there are multiple threads working registration functions off the worklist queue. In certain timing scenarios, this invalidated the assumption that all registrations were complete when Sdf_SpecTypeInfo::GetInstance returned. As a result, in certain timing scenarios, calling code that relied on this class to cast specs would see unexpected failures as valid casts would actually fail due to the Typeinfo singleton being in an incomplete state.

matthewcpp avatar Sep 27 '25 02:09 matthewcpp