TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

feat: C++ runtime on Windows

Open HolyWu opened this issue 1 year ago • 7 comments

Description

Fix and enable C++ runtime on Windows.

Fixes #2247 Fixes #2371 Fixes #2484

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • [x] My code follows the style guidelines of this project (You can use the linters)
  • [x] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have added tests to verify my fix or my feature
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified

HolyWu avatar May 02 '24 16:05 HolyWu

Some observations about the failed tests.

HolyWu avatar May 04 '24 04:05 HolyWu

No ciflow labels are configured for this repo. For information on how to enable CIFlow bot see this wiki

pytorch-bot[bot] avatar May 17 '24 05:05 pytorch-bot[bot]

Hi @HolyWu - I wanted to share a few notes on the C++ runtime on Windows after some local testing:

  • When compiling locally, I had to upgrade the Bazel version to 6.3.2 so it could find cl and other C++ WIndows dependencies. Still, it fails during the ninja build. Have you seen the following error before?
../core/conversion/conversion.h(5): fatal error C1083: Cannot open include file: 'NvInfer.h': No such file or directory
  • I then tried downloading the artifact from the GHA job and installing, which worked.

Based on the GHA-installed package, I ran the CI-failing tests, for which I have the following results:

  • test_bert_base_uncased for both export and compile fails with transformers==4.41.0, but passes with transformers==4.39.3. Could you try fixing this version in the .yml?
  • test_linear_3_multi_dim_matrix + test_linear_4_multi_dim_matrix - these pass/fail nondeterministically for me on Windows. They fail more often and with larger variance than they do on Linux, but failures on this test have been seen on both systems. Needs further investigation on my end.
  • test_bert_base_uncased in the Torchscript path causes a Windows fatal exception on my machine as well - this should be skipped on Windows only while it is investigated.
    • The above is also resolved by using transformers==4.39.3 on my machine

gs-olive avatar May 22 '24 05:05 gs-olive

Thanks for the notes. I have fixed the version of transformers and also add a missing test.

As for the error you encountered during ninja build, I haven't seen the exact same error. But I suspect you probably didn't run python setup.py bdist_wheel under x64 Native Tools Command Prompt for VS 2022 environment. The VS installer should have created the shortcut in your Start Menu. Or you can manually execute vcvars64.bat located in C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build. Now before running python setup.py bdist_wheel, you have to set DISTUTILS_USE_SDK env var first by executing set DISTUTILS_USE_SDK=1 (or you can add this env var to system permanently).

HolyWu avatar May 22 '24 14:05 HolyWu

Thanks for the suggestions. I also added set INCLUDE={TENSORRT_PATH}, which enabled finding NvInfer.h, but now it shows errors for other .h files such as the following. I am not sure why the x64 Native Tools Command Prompt for VS 2022 does not appear to use the Windows PATH to source the .h files.

Python310\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory

gs-olive avatar May 23 '24 02:05 gs-olive

PATH is only used for searching executable files, not headers or libraries. It's a bit weird that you had to manually specify include path for TensorRT, since it should have been set here and the directory should be there after bazel had completed building. Nevertheless, INCLUDE already has prefined paths under Command Prompt for VS 2022, so set INCLUDE={TENSORRT_PATH} overwrites them. You should set INCLUDE={TENSORRT_PATH};%INCLUDE% to append new path to existing one. You can echo %INCLUDE% to verify it.

HolyWu avatar May 23 '24 13:05 HolyWu

Hi @HolyWu - thanks for the suggestion. Ultimately, the issue was that the repo directory was not its original name, and so the NvInfer.h was not included in the setup.py. That has now been fixed, and it compiles to completion with bazel 6.3.2 (version 6.2.1 still says it cannot find cl and other MSVC utilities, even though they're available in that command prompt).

For some reason when importing the built package however, it shows OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed and the error seems to be sourced from torch_python.dll!THPGenerator_initDefaultGenerator. I'm not sure what is causing this issue, but I wanted to check if there might be any other .dlls I am missing.

The line causing the issue is where the library is registered with PyTorch and ultimately loaded, here:

  File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 114, in <module>
    _register_with_torch()
  File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 107, in _register_with_torch
    torch.ops.load_library(linked_file_full_path)

gs-olive avatar May 24 '24 04:05 gs-olive

I have no clue. torchtrt.dll does not have a dependency on torch_python.dll, since torch_python.lib does not get used during linking torchtrt.dll. Can you upload your built wheel?

HolyWu avatar May 24 '24 15:05 HolyWu

Hi @HolyWu - ultimately, the issue was my usage of python setup.py develop and not python setup.py bdist_wheel; the latter works well. I was unable to get around the need for bazel 6.3.2, however. On your local build, is bazel 6.2.1 able to locate MSVC utilities like cl from within Command Prompt for VS 2022?

gs-olive avatar May 29 '24 16:05 gs-olive

Yes, Bazel 6.2.1 can find MSVC utilities on my local build. Only solution you can try is setting BAZEL_VC environment variable to tell Bazel to use a specific VC installation, like set "BAZEL_VC=C:\Program Files\Microsoft Visual Studio\2022\Community\VC" (change to your actual installation path). I don't set that variable and Bazel still finds cl though.

HolyWu avatar May 29 '24 17:05 HolyWu

Hi @gs-olive. It seems that Windows tests often encounter RuntimeError: No CUDA GPUs are available since the ciflow/binaries/all label has been added. I'm not sure whether switching to windows.8xlarge.nvidia.gpu.nonephemeral runner will make it better or worse though. But let's try and see.

HolyWu avatar May 31 '24 12:05 HolyWu

No more No CUDA GPUs are available error!

HolyWu avatar Jun 01 '24 06:06 HolyWu

Not sure why two torchscript frontend tests are failing with an error didn't happen before, but the others passed.

HolyWu avatar Jun 03 '24 23:06 HolyWu

I have rerun those failing tests - it is also interesting that the accuracy issues on the tests referenced here seem to not have occurred this time.

gs-olive avatar Jun 04 '24 01:06 gs-olive

Fixes: https://github.com/pytorch/TensorRT/issues/2645

narendasan avatar Jun 07 '24 01:06 narendasan