OpenUSD icon indicating copy to clipboard operation
OpenUSD copied to clipboard

[tbb] Please support tbb version 2021.1.1

Open JackBoosY opened this issue 5 years ago • 54 comments

Hi guys, I'm vcpkg maintainer. I tried to update tbb to the latest version 2021.1.1 in vcpkg (PR microsoft/vcpkg#16321), but seems that usd doesn't support this version. And some other libraries has been adapted for that. Can you consider updating the code to adapt to this version?

Thanks.

JackBoosY avatar Mar 03 '21 06:03 JackBoosY

Filed as internal issue #USD-6600

jilliene avatar Mar 09 '21 22:03 jilliene

Confirmed - after much trial & error, was able to get a working build using the 'tbb_2019' branch (which unfortunately did not have a cmake build or exported config yet).

The biggest hurdle seems to be the obsoleting of tbb::atomic (replaced with std::atomic sometime in late 2020).

IMHO, there is little reason to not submodule tbb & just let tbb's cmake config take care of everything, but just in case, I am attaching the snippet i had to modify in the FindTBB.cmake module (because the version file moved too)

  find_file(TBB_VERSION_FILE 
      NAMES oneapi/tbb/version.h tbb/version.h tbb/tbb_stddef.h       
      HINTS ${TBB_INCLUDE_DIRS} ${TBB_SEARCH_DIR}
      PATHS ${TBB_DEFAULT_SEARCH_DIR})

  if(TBB_VERSION_FILE)
    file(READ "${TBB_VERSION_FILE}" _tbb_version_file)

    string(REGEX REPLACE ".*#define TBB_VERSION_MAJOR ([0-9]+).*" "\\1"
        TBB_VERSION_MAJOR "${_tbb_version_file}")

    string(REGEX REPLACE ".*#define TBB_VERSION_MINOR ([0-9]+).*" "\\1"
        TBB_VERSION_MINOR "${_tbb_version_file}")

    string(REGEX REPLACE ".*#define TBB_INTERFACE_VERSION ([0-9]+).*" "\\1"
        TBB_INTERFACE_VERSION "${_tbb_version_file}")

    set(TBB_VERSION "${TBB_VERSION_MAJOR}.${TBB_VERSION_MINOR}")
  endif()

manuelk avatar Mar 12 '21 23:03 manuelk

Maybe the biggest hurdle can be cleared by moving USD to std::atomic :) You know what gets my vote! @manuelk , is that the approach you took?

meshula avatar Mar 13 '21 00:03 meshula

the compiler was throwing lots of errors, so i didn't really try to fix it (with the idea that other things may have been obsoleted and apparently USD was not being kept up to date obviously)

took the easy out & just reverted TBB to a 2019 version

i do agree with you that a lot of the 'arch' code in USD should be moved to modern C++ (mutex, "ref" ptrs, ...) with the hope of maybe killing the now extremely unwieldy TBB dependency (full win10 binary installer is 22GB on disk - for real Intel !)

manuelk avatar Mar 13 '21 03:03 manuelk

Something else that makes no sense w/ TBB handling in USD :

in the TBB module you correctly define a debug & release interface (assuming the expression works - it's hard to read...)

      set_target_properties(TBB::tbb PROPERTIES
          INTERFACE_COMPILE_DEFINITIONS "$<$<OR:$<CONFIG:Debug>,$<CONFIG:RelWithDebInfo>>:TBB_USE_DEBUG=1>"
          IMPORTED_LOCATION_DEBUG          ${TBB_LIBRARIES_DEBUG}
          IMPORTED_LOCATION_RELWITHDEBINFO ${TBB_LIBRARIES_DEBUG}
          IMPORTED_LOCATION_RELEASE        ${TBB_LIBRARIES_RELEASE}
          IMPORTED_LOCATION_MINSIZEREL     ${TBB_LIBRARIES_RELEASE}
          )

but then all the USD components bypass the interface to link directly with the component variable:

pxr_library(tf
    LIBRARIES
        arch
        ${WINLIBS}
        ${PYTHON_LIBRARIES}
        ${Boost_PYTHON_LIBRARY}
        ${TBB_tbb_LIBRARY}

Obviously this does not work well for debug builds, as neither TBB_USE_DEBUG is set, nor is the linker pointed to the right library.

manuelk avatar Mar 15 '21 23:03 manuelk

@jilliene @meshula We're moving up to TBB 2020 update 3 for the next major release of Houdini right now (since CY2021 vfxplatform.com is TBB 2020). I'm noticing the same issues except they're flagged as deprecation warnings instead of out right removals. I think it'll be worthwhile to first move USD up to TBB 2020 and replace uses as we go to remove the deprecated functionality.

e4lam avatar Apr 20 '21 19:04 e4lam

@manuelk Yeah, it would be more correct/bulletproof to link via TBB::tbb, not ${TBB_tbb_LIBRARY} (as elaborated in the plethora of "modern cmake" articles.)

meshula avatar Apr 20 '21 20:04 meshula

Do you have a timeline for updating to TBB 2021?

HnimNart avatar Aug 06 '21 09:08 HnimNart

We don't unfortunately have a timeline for moving to newer versions of tbb... though we typically try to at least follow http://vfxplatform.com/

c64kernal avatar Aug 10 '21 03:08 c64kernal

I've just run a quick scan for TBB over the source code - the majority of the usage seems to be for things like mutex, atomic and thread local storage, which should be transitioned to modern C++ (as recommended by Intel in TBB).

However, there also seems to be enough use of thread arenas, concurrent queues, etc that deprecating TBB would not be a small task (it's really unfortunate that OneTBB has become so unwieldy in recent years). But it's still nothing compared to the headaches of boost + python though.

manuelk avatar Aug 10 '21 20:08 manuelk

@manuelk , even if somehow there were enough resources in the universe (so to speak) to wean USD off of TBB, that in itself would create an even less tractable problem: for better or worse, TBB has becomes the CPU resource management system of the VFX industry, so if USD were to move away from it, it would become really difficult to share resources with host applications in a good-citizen and performant way.

I think the only realistic way to remove the TBB dependency from USD is if the VFX platform itself mandates it, so that USD and all its consumers could move together.

spiffmon avatar Aug 10 '21 21:08 spiffmon

There be a boat anchor... guess we can commiserate around beers at the next non-virtual Siggraph :) (if ever)

manuelk avatar Aug 10 '21 22:08 manuelk

Spiff points out that using TBB helps alleviate CPU over-subscription, a might important consideration. That does leave some room for some targeted weaning, around some std provided constructs like atomics, and particularly around the parts of TBB slated for removal, at least on vfxplatform gated time table.

Sidebar: To get the same benefit on the over-subscription issue on Apple platforms, the Work system would need to cooperate with Grand Central Dispatch. Perhaps longer term, there's more to discuss around that, as an independent topic.

meshula avatar Aug 11 '21 19:08 meshula

Hi, guys I ended up here because I wanted to to update TBB through VCPKG in our codebase and followed the threads to see what was the issue and how I could maybe help solving it.

I'd like to share why we have to update it or our codebase because after looking at this repo, it might have implications on your side. It all boils down to spin lock and why unless you have a really guru expert, you should not play with fire and try to save some performance by using them instead of regular mutexes.

Long story short: in their internal task_queue, prior to the 2021 release, they were using a spin lock to protect the queue which, as is often the case with spin lock, resulted on occasional lock access freeze when trying to access the queue.

This is probably relevant to this repo because I see tbb::spin_mutex used all over the place. So either

  • Your lib is never used on high CPU loads or on single core CPU
  • You have a real threading expert internally who understands the issue I'm talking about and reviewed all spin locks usage to guarantee none of them are problematic (But note that even the guys at tbb did mistakes when using spin locks)
  • This is all legacy code and no one really knows why it's spin mutex and not regular mutex

ggris avatar Aug 24 '21 08:08 ggris

Thanks @ggris ! We monitor our performance suite for lock contention, and have deployed several strategies for reducing it when it crops up. We have not observed correctness issues or deadlocks attributable to our use of tbb's spin locks, but if you have evidence of such a problem, or of scalability issues (we do not regularly run on more than 32 cores yet), please let us know.

spiffmon avatar Aug 25 '21 17:08 spiffmon

@ggris On a side note, can you point to where the fix is for TBB? Thanks!

e4lam avatar Aug 26 '21 13:08 e4lam

@e4lam Actually, I was mistaken in thinking it was already merged but it is a fix for this ticket: https://github.com/oneapi-src/oneTBB/issues/546 Note that in this case the issue comes from an implicit spin lock pattern. As I learned while researching this, it seems TBB spin_mutex is protected against the typical spin lock pattern issue with a backoff strategy and the solution for this ticket is to apply similar backoff strategy in this implicit spin lock pattern. I've had so many issues in the past with libraries using bad implementation of the spin lock pattern that I now assume by default that all spin lock patterns are buggy.

I ended up backporting the backoff fix to our internal build of TBB as it is unlikely we'll get an up to date TBB

ggris avatar Aug 26 '21 13:08 ggris

@c64kernal at the very least could the codebase remove usage of deprecated (now removed) TBB features and move to the std:: replacements? That way newer TBB binaries can be used at runtime. Even if USD is built against TBB 2020 Update 3, if it's not using features that have been removed in 2021.4.0, we should be able to drop in 2021.4.0 binaries (which other tools we use may rely on) and allow things to still function. An upgrade guide can be found here: https://www.intel.com/content/www/us/en/developer/articles/technical/tbb-revamp.html

mbechard avatar Oct 21 '21 20:10 mbechard

@mbechard -- yes for sure! If there is a minimally invasive change that you'd like to PR, we'd love to have a look and would definitely consider. We're currently constrained to C++14 by the way. Thanks so much for pushing through this!

c64kernal avatar Oct 22 '21 02:10 c64kernal

@c64kernal unfortunately I don't think I'm in a position to do a PR for this. Certainly I could upgrade the tbb::atomics, but things such as tbb::empty_task and related threading work would likely be tricky for me to do as I've spent no time working in the USD codebase (and really, barely any in TBB as well). Since these changes are compatible with both the current codebase and in the future (and will be required once vfxplatforms reaches TBB 2021), I was suggesting they get added sooner rather than later to allow people to use the newer TBB when required. Thanks for your consideration!

mbechard avatar Oct 22 '21 20:10 mbechard

Okay got it, thanks again @mbechard -- I filed an internal issue to do this initial clean up, and we'll prioritize as best we can. Thanks again for the suggestion.

c64kernal avatar Oct 23 '21 17:10 c64kernal

Hello,

A little introduction: I'm Sybren, the developer who added USD support to Blender.

We don't unfortunately have a timeline for moving to newer versions of tbb... though we typically try to at least follow http://vfxplatform.com/

This puts anyone who wants to use USD in an application in a tough spot. The VFX Reference Platform has a strange mix of super new versions of libraries and deprecated/outdated versions. For example, for a while 'CY2021' targeted a not-yet-released version (3.0) of OpenEXR while targeting a version of Python (3.7) that was so old it would be unsupported for the 2nd half of the year.

If USD developers follow VFX Reference Platform in the sense of only supporting deprecated versions of these libraries, this means that any application that wants to use USD also has to lag behind.

USD is using parts of the TBB API that were deprecated two years ago. Various Linux distributions will ship with a newer version soon; they typically only allow installing a single version of such libraries, which means that building Blender with USD support on contemporary Linuxes is going to be problematic.

I filed an internal issue to do this initial clean up, and we'll prioritize as best we can.

This was posted about 4 months ago. What is the current status?

PS: OpenSubDiv also still only supports the deprecated TBB API, but there at least patching the library is fairly trivial. For USD the work seems to be more considerable, though, which means that Blender can't just include a patch to make it work.

sybrenstuvel avatar Mar 10 '22 14:03 sybrenstuvel

Hi @sybrenstuvel , We understand your vexation, and honestly would very much like to be on TBB's OneAPI. We hope to be able to do this upgrade by end of year. To understand why it is taking us so long, there are two factors at play, which are related: resources, and sensitivity to the VFX industry.

You may think the VFX reference platform is criminally impeding forward progress, but in fact it protects animation and vfx studios from vendor thrashing, given our (unfortunate, but difficult to change) slow pace of updating OS's and system components. I am attaching a fascinating survey of 88 anim/Vfx studios conducted last year by the VES, for your information. It shows where the industry is in terms of OS's, but the thing I really want to highlight to you is that until the (Predicted) end of this year, not even a majority of VFX houses will have migrated to Python 3.x yet! Pixar is, in fact, in the middle of wrestling our large internal codebase into 3.x compatibility, and even though (thanks to NVidia!) USD can be built with python 3, internally all production and development builds still use python 2.7x.

Referring back to the reference platform, then, we and other studios are still trying to adopt CY2020, and the move to OneAPI does not happen until CY2023. We may jump right to CY2022 after finishing 2020 adoption, but in either case, even CY2022 is only TBB 2020, reflective of the widespread expectation that moving forward to OneApi is going to be a Big Deal. In fact, we do not yet even know how we are going to manage it without serious performance degradation to some of our internal software, because OneApi removes some lower-level API's that we leverage to good effect.

If you or others have the resources and will to want to see USD be OneAPI compatible ahead of our ability to make it so, we would be very appreciative. The constraints, as we see them, are:

  1. Anything that would want to land soon would, in deference to vfx studios and the software they are currently using, need to support building against either TBB 2019 or 202x .
  2. As you mentioned, USD still uses some TBB containers/locks that have been deprecated in favor of std options. But many of those are in performance-sensitive codesites, so adopting alternatives may require extra consideration. We are working, for other collaborations, to make available some of the performance tests and test assets we rely on internally, but that may not bear fruit soon enough to be useful.

That said, I do think such a contribution would help speed this process up.

Thanks, Sybren!

spiffmon avatar Mar 15 '22 01:03 spiffmon

Hi all,

Just a heads-up I started to have a crack at it today and have made some progress: https://gist.github.com/boberfly/07f7f8a4c7433d8ebd5cdfa640ef7758

As you can see in the error log, it still has an issue and my head hurts from all the template+namespace+function bindings... :)

https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Migration_Guide/Task_API.html

Cheers

boberfly avatar Jun 07 '22 05:06 boberfly

@boberfly I'd suggest that you split it out under separate PR's by porting in different branches if possible, one for each category of API changes. In particular, I think at least the atomic changes should be separate.

e4lam avatar Jun 07 '22 12:06 e4lam

@e4lam good idea, I am just trying to get this working on regular TBB 2020U3 to verify my TBB_INTERFACE_VERSION defines and std::atomic replacements are working, unfortunately this one in particular: https://github.com/PixarAnimationStudios/USD/blob/release/pxr/usd/usdGeom/bboxCache.cpp#L107 causes template deduce issues for some reason on this line for the insert which seems bonkers to me at the moment, if anyone else has any ideas... :)

boberfly avatar Jun 07 '22 18:06 boberfly

@boberfly The bboxCache has been the bane of my existence in terms of being able to upgrade TBB. Every time I've went through all the work you've done - I have gotten stuck at this exact section. If you know how to fix it that would be huge.

We may need to reach out to the Intel TBB team to get this resolved, I've been staring at this issue for years 😆.

This issue also exists on all operating systems - macOS, Windows, and Linux.

furby-tm avatar Jun 13 '22 10:06 furby-tm

@sybrenstuvel I'd be happy to actively maintain a custom USD build that fits Blender's dependency needs, as it is something I typically find myself doing anyway.

furby-tm avatar Jun 13 '22 13:06 furby-tm

@boberfly, it could be the compiler is having trouble with the make_pair rvalue with the later version of tbb. A suggestion - the way I'd go about debugging is to break it out onto its own line as a lvalue, either as an auto or an explicitly declared pair.

meshula avatar Jun 13 '22 16:06 meshula

@sybrenstuvel I'd be happy to actively maintain a custom USD build that fits Blender's dependency needs, as it is something I typically find myself doing anyway.

That is a very generous offer @furby-tm . Please pop over to the #blender-builds channel on Blender Chat to talk about this. That way the other platform maintainers (and other interested parties) are also in the loop.

sybrenstuvel avatar Jun 14 '22 09:06 sybrenstuvel