feat: C++ runtime on Windows
Description
Fix and enable C++ runtime on Windows.
Fixes #2247 Fixes #2371 Fixes #2484
Type of change
- New feature (non-breaking change which adds functionality)
Checklist:
- [x] My code follows the style guidelines of this project (You can use the linters)
- [x] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas and hacks
- [ ] I have made corresponding changes to the documentation
- [ ] I have added tests to verify my fix or my feature
- [ ] New and existing unit tests pass locally with my changes
- [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified
Some observations about the failed tests.
- Test torchscript frontend: TypeError: convert_graph_to_trt_engine(): incompatible function arguments and TypeError: compile_graph(): incompatible function arguments. Caused by ABI incompatibility with pybind11. Fixed in a later commit.
- Test dynamo converters: ValueError: [TRT] [E] Could not implicitly convert NumPy data type: i32 to TensorRT. Introduced by https://github.com/pytorch/TensorRT/commit/de81be2e4fa36b32d9fd23dccf7f32d0c754ebef. Replacing
dtype=np.dtype("i")withdtype=np.int32can fix it. But I'm not sure whethernp.dtype("i")always equals int32 or int64 depending on platforms. So I'll leave this to you. - Test dynamo export serde: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/trt.ep'. Should just use
"./trt.ep"instead of"/tmp/trt.ep"like the other tests. Fixed in a later commit.
No ciflow labels are configured for this repo. For information on how to enable CIFlow bot see this wiki
Hi @HolyWu - I wanted to share a few notes on the C++ runtime on Windows after some local testing:
- When compiling locally, I had to upgrade the Bazel version to
6.3.2so it could findcland other C++ WIndows dependencies. Still, it fails during the ninja build. Have you seen the following error before?
../core/conversion/conversion.h(5): fatal error C1083: Cannot open include file: 'NvInfer.h': No such file or directory
- I then tried downloading the artifact from the GHA job and installing, which worked.
Based on the GHA-installed package, I ran the CI-failing tests, for which I have the following results:
-
test_bert_base_uncasedfor both export and compile fails withtransformers==4.41.0, but passes withtransformers==4.39.3. Could you try fixing this version in the.yml? -
test_linear_3_multi_dim_matrix+test_linear_4_multi_dim_matrix- these pass/fail nondeterministically for me on Windows. They fail more often and with larger variance than they do on Linux, but failures on this test have been seen on both systems. Needs further investigation on my end. -
test_bert_base_uncasedin the Torchscript path causes aWindows fatal exceptionon my machine as well - this should be skipped on Windows only while it is investigated.- The above is also resolved by using
transformers==4.39.3on my machine
- The above is also resolved by using
Thanks for the notes. I have fixed the version of transformers and also add a missing test.
As for the error you encountered during ninja build, I haven't seen the exact same error. But I suspect you probably didn't run python setup.py bdist_wheel under x64 Native Tools Command Prompt for VS 2022 environment. The VS installer should have created the shortcut in your Start Menu. Or you can manually execute vcvars64.bat located in C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build. Now before running python setup.py bdist_wheel, you have to set DISTUTILS_USE_SDK env var first by executing set DISTUTILS_USE_SDK=1 (or you can add this env var to system permanently).
Thanks for the suggestions. I also added set INCLUDE={TENSORRT_PATH}, which enabled finding NvInfer.h, but now it shows errors for other .h files such as the following. I am not sure why the x64 Native Tools Command Prompt for VS 2022 does not appear to use the Windows PATH to source the .h files.
Python310\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
PATH is only used for searching executable files, not headers or libraries. It's a bit weird that you had to manually specify include path for TensorRT, since it should have been set here and the directory should be there after bazel had completed building. Nevertheless, INCLUDE already has prefined paths under Command Prompt for VS 2022, so set INCLUDE={TENSORRT_PATH} overwrites them. You should set INCLUDE={TENSORRT_PATH};%INCLUDE% to append new path to existing one. You can echo %INCLUDE% to verify it.
Hi @HolyWu - thanks for the suggestion. Ultimately, the issue was that the repo directory was not its original name, and so the NvInfer.h was not included in the setup.py. That has now been fixed, and it compiles to completion with bazel 6.3.2 (version 6.2.1 still says it cannot find cl and other MSVC utilities, even though they're available in that command prompt).
For some reason when importing the built package however, it shows OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed and the error seems to be sourced from torch_python.dll!THPGenerator_initDefaultGenerator. I'm not sure what is causing this issue, but I wanted to check if there might be any other .dlls I am missing.
The line causing the issue is where the library is registered with PyTorch and ultimately loaded, here:
File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 114, in <module>
_register_with_torch()
File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 107, in _register_with_torch
torch.ops.load_library(linked_file_full_path)
I have no clue. torchtrt.dll does not have a dependency on torch_python.dll, since torch_python.lib does not get used during linking torchtrt.dll. Can you upload your built wheel?
Hi @HolyWu - ultimately, the issue was my usage of python setup.py develop and not python setup.py bdist_wheel; the latter works well. I was unable to get around the need for bazel 6.3.2, however. On your local build, is bazel 6.2.1 able to locate MSVC utilities like cl from within Command Prompt for VS 2022?
Yes, Bazel 6.2.1 can find MSVC utilities on my local build. Only solution you can try is setting BAZEL_VC environment variable to tell Bazel to use a specific VC installation, like set "BAZEL_VC=C:\Program Files\Microsoft Visual Studio\2022\Community\VC" (change to your actual installation path). I don't set that variable and Bazel still finds cl though.
Hi @gs-olive. It seems that Windows tests often encounter RuntimeError: No CUDA GPUs are available since the ciflow/binaries/all label has been added. I'm not sure whether switching to windows.8xlarge.nvidia.gpu.nonephemeral runner will make it better or worse though. But let's try and see.
No more No CUDA GPUs are available error!
Not sure why two torchscript frontend tests are failing with an error didn't happen before, but the others passed.
I have rerun those failing tests - it is also interesting that the accuracy issues on the tests referenced here seem to not have occurred this time.
Fixes: https://github.com/pytorch/TensorRT/issues/2645