inference_results_v0.5 icon indicating copy to clipboard operation
inference_results_v0.5 copied to clipboard

Source code about the model and performance

Open GaryYuyjl opened this issue 6 years ago • 14 comments

For the model and fine tuning parts, I can't find the source code. I only could use a .a library to reproduce the result. Could you please provide the code?

Besides, I reproduced most of the part in int4. However, I could only achieve performance about 6910.68 image/second, which is much lower than in the dev blog. I tested with only one T4 GPU. Does that cause the low performance?

GaryYuyjl avatar Dec 19 '19 06:12 GaryYuyjl

CUDA Version: 10.1 Linux version 4.15.0-1056-aws Ubuntu 18.04.3 LTS

GaryYuyjl avatar Dec 19 '19 06:12 GaryYuyjl

@GaryYuyjl I assume you are talking about NVIDIA's open submission with low-precision ResNet50? What's your TensorRT version?

/cc @DilipSequeira @nvpohanh

psyhtest avatar Dec 19 '19 13:12 psyhtest

Our open submission does not use TRT.

@GaryYuyjl Could you try reproducing our closed division ResNet50 number (for INT8)? This will help us to identify the issue.

nvpohanh avatar Dec 19 '19 17:12 nvpohanh

I've encountered another a problem:

ubuntu@ip-172-31-41-37:~/inference_results_v0.5/closed/NVIDIA$ docker exec mlperf-inference-ubuntu bash -c 'make generate_engines RUN_ARGS="--benchmaks=resnet --scenarios=Offline"' [2019-12-25 13:09:21,358 init.py:119 WARNING] Cannot find valid configs for 1x Tesla T4. Using 8x Tesla T4 configs instead. [2019-12-25 13:09:21,358 main.py:291 INFO] Using config files: measurements/T4x8/mobilenet/Offline/config.json,measurements/T4x8/resnet/Offline/config.json,measurements/T4x8/ssd-small/Offline/config.json,measurements/T4x8/ssd-large/Offline/config.json,measurements/T4x8/gnmt/Offline/config.json [2019-12-25 13:09:21,358 init.py:142 INFO] Parsing config file measurements/T4x8/mobilenet/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/resnet/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/ssd-small/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/ssd-large/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/gnmt/Offline/config.json ... [2019-12-25 13:09:21,359 main.py:295 INFO] Processing config "T4x8_mobilenet_Offline" [2019-12-25 13:09:21,422 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:21,423 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:22,671 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' [2019-12-25 13:09:24,635 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:24,636 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:25,882 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' [2019-12-25 13:09:27,840 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:27,840 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:29,119 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-3: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' Traceback (most recent call last): File "code/main.py", line 327, in main() File "code/main.py", line 317, in main launch_handle_generate_engine(benchmark_name, benchmark_conf, need_gpu, need_dla) File "code/main.py", line 80, in launch_handle_generate_engine raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! make: *** [generate_engines] Error 1 Makefile:298: recipe for target 'generate_engines' failed

I guess it may be caused by the config file... How to change the T4x8 config file to T4x1? I tried to change the "gpu_offline_expected_qps" in config.json but it didn't work.

GaryYuyjl avatar Dec 25 '19 13:12 GaryYuyjl

Please find the solution here: https://github.com/mlperf/inference_results_v0.5/issues/7#issuecomment-561804070

This is caused by the CUDA-10.2 release, which TRT6 does not support.

nvpohanh avatar Dec 26 '19 05:12 nvpohanh

Still the same problem. CUDA version is 10.1. I also tried https://github.com/mlperf/inference_results_v0.5/issues/7#issuecomment-559206390 ,but it didn't work.

GaryYuyjl avatar Dec 31 '19 05:12 GaryYuyjl

@GaryYuyjl Could you make sure you re-build the docker image after making that change?

To verify if the change takes effect, please run ls /usr/lib/x86_64-linux-gnu/libcublas* in container to check cuBLAS version. The correct one should be libcublas.so.10.2.1.243, whereas the original codes will install libcublas.so.10.2.2.88, which is too new for TRT6.

nvpohanh avatar Dec 31 '19 07:12 nvpohanh

Thank you. I forgot to run make build_docker. But I found another problem after I reran it:

$ docker exec mlperf-inference-ubuntu bash -c 'make generate_engines RUN_ARGS="--benchmarks=resnet --test_mode=PerformanceOnly --scenarios=Offline --config=/work/measurements/T4x8/resnet/Offline/config.json"'
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/work/code/main.py", line 101, in handle_generate_engine
    b = get_benchmark(benchmark_name, config)
  File "/work/code/main.py", line 33, in get_benchmark
    ResNet50 = import_module("code.resnet.tensorrt.ResNet50").ResNet50
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/work/code/resnet/tensorrt/ResNet50.py", line 23, in <module>
    RN50Calibrator = import_module("code.resnet.tensorrt.calibrator").RN50Calibrator
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/work/code/resnet/tensorrt/calibrator.py", line 19, in <module>
    import pycuda.driver as cuda
  File "/usr/local/lib/python3.6/dist-packages/pycuda/driver.py", line 5, in <module>
    from pycuda._driver import *  # noqa
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

GaryYuyjl avatar Jan 01 '20 06:01 GaryYuyjl

@GaryYuyjl Please make sure that you have https://github.com/NVIDIA/nvidia-docker properly installed. Please run with docker run --gpus=all ... (or nvidia-docker run ... if you installed nvidia-docker2 package) so that the docker container can "see" the GPUs.

nvpohanh avatar Jan 01 '20 07:01 nvpohanh

Thank you. Here is the results: Resnet Offline: "Accuracy = 76.034, Threshold = 75.695. Accuracy test PASSED." "Samples per second: 4247.88 and Result is : VALID"

Server: "Accuracy = 76.034, Threshold = 75.695. Accuracy test PASSED." "Scheduled samples per second : 41546.64 and Result is : INVALID"

GaryYuyjl avatar Jan 04 '20 15:01 GaryYuyjl

@GaryYuyjl It does seem that your T4 runs more slowly. ResNet Offline should give ~5.5k infer/s.

Could you run nvidia-smi dmon -s pc in parallel to track the temperature and the clock frequencies? If you see the temperature goes up above, say, 70C, then there might be some issue with the cooling. T4 has passive cooling, so it is sensitive to cooling efficiency.

nvpohanh avatar Jan 06 '20 05:01 nvpohanh

I think that's the problem. The highest temperature rises to about 80C and the pclk drops down to about 720MHZ.

GaryYuyjl avatar Jan 07 '20 00:01 GaryYuyjl

@nvpohanh Just curious what kind of cooling method do you use for T4.

Laurawly avatar Jan 13 '20 21:01 Laurawly

I think we were just doing the same air-cooling as normal server rooms.

nvpohanh avatar Jan 14 '20 17:01 nvpohanh