xla
xla copied to clipboard
Added support for compiling the CUDA stubs on Windows.
I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning:
- This commit added support for lazily loading symbols from the CUDA shared libraries.
- As discussed in this issue, the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC).
- After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows.
- For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files.
@metab0t is this what you were suggesting in the end of that discussion?
cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support)
This fails our Ubuntu CI tests, because the linker can't find a ton of CUDA symbols. Likely what's happening is that the new config causes the necessary .S and/or .cc files to not be linked at all. However, I couldn't quite figure out what exactly is the issue. I suspect that something is going wrong when replacing if_cuda_is_configured with @local_config_cuda, but I'm not sure.
@eaplatanios Can you have a look, do you know what might be going wrong?
I suspect that what's going wrong is that if_cuda_is_configured was replaced with is_cuda_enabled. According to the documentation of these rules, the former does not need --config=cuda, whereas the latter does. So the condition became stricter and is no longer true, leading to the symbols being undefined.
@chsigg I see that a long time ago, you made some changes to the if_cuda_is_configured macro and friends. Do you have a suggestion for what can be done here to make this PR work?
Hey, sorry for reacting so late. I have a few concerns with this change:
- We are currently trying to make a bigger change on how we consume CUDA in XLA. PR is here: https://github.com/openxla/xla/pull/10673 This change will interfere with that, so I would prefer if we land this after the mentioned PR gets merged. (Sorry for the inconvenience - it's a massive change that affects XLA, Tensorflow, and JAX.)
- We shouldn't modify the trampoline files since these are mostly autogenerated. Manually updating them would make it harder to update them. In fact I would like to have them autogenerated during the build automatically instead of having a copy checked in. Instead I suggest the following: 1. Rename the
cublascc-libraryrule intocublas_stub. 2. Then create an alias rulealias(name = "cublas", actual = select(...))which either depends oncublas_stubor on the library itself. Something similar will be done as part of #10673 and can probably be reused/extended for that purpose. Have a look at the changes ofxla/tsl/cuda/BUILD.bazelin #10673 for more details.
Henning
@eaplatanios @ddunl Please check this comment in a parallel thread https://github.com/openxla/xla/issues/6993#issuecomment-2290173522. tl;dr; unless there is a plan to give proper cuda support on windows for XLA, which would be a huge task, I don't think we shoudl be trying to compile cuda-specific code on Windows.
@vam-google that sounds fine to me. Sorry for the late response; I was traveling over the past month. I also decided to give up on CUDA support on Windows for XLA basically for the reason you mentioned. It does seem to be a huge task and I can just sort out something to work with WSL on Windows rather than take this on right now. Thanks for reviewing this!