Source code about the model and performance
For the model and fine tuning parts, I can't find the source code. I only could use a .a library to reproduce the result. Could you please provide the code?
Besides, I reproduced most of the part in int4. However, I could only achieve performance about 6910.68 image/second, which is much lower than in the dev blog. I tested with only one T4 GPU. Does that cause the low performance?
CUDA Version: 10.1 Linux version 4.15.0-1056-aws Ubuntu 18.04.3 LTS
@GaryYuyjl I assume you are talking about NVIDIA's open submission with low-precision ResNet50? What's your TensorRT version?
/cc @DilipSequeira @nvpohanh
Our open submission does not use TRT.
@GaryYuyjl Could you try reproducing our closed division ResNet50 number (for INT8)? This will help us to identify the issue.
I've encountered another a problem:
ubuntu@ip-172-31-41-37:~/inference_results_v0.5/closed/NVIDIA$ docker exec mlperf-inference-ubuntu bash -c 'make generate_engines RUN_ARGS="--benchmaks=resnet --scenarios=Offline"' [2019-12-25 13:09:21,358 init.py:119 WARNING] Cannot find valid configs for 1x Tesla T4. Using 8x Tesla T4 configs instead. [2019-12-25 13:09:21,358 main.py:291 INFO] Using config files: measurements/T4x8/mobilenet/Offline/config.json,measurements/T4x8/resnet/Offline/config.json,measurements/T4x8/ssd-small/Offline/config.json,measurements/T4x8/ssd-large/Offline/config.json,measurements/T4x8/gnmt/Offline/config.json [2019-12-25 13:09:21,358 init.py:142 INFO] Parsing config file measurements/T4x8/mobilenet/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/resnet/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/ssd-small/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/ssd-large/Offline/config.json ... [2019-12-25 13:09:21,359 init.py:142 INFO] Parsing config file measurements/T4x8/gnmt/Offline/config.json ... [2019-12-25 13:09:21,359 main.py:295 INFO] Processing config "T4x8_mobilenet_Offline" [2019-12-25 13:09:21,422 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:21,423 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:22,671 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' [2019-12-25 13:09:24,635 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:24,636 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:25,882 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' [2019-12-25 13:09:27,840 main.py:83 INFO] Building engines for mobilenet benchmark in Offline scenario... [2019-12-25 13:09:27,840 main.py:100 INFO] Building GPU engine for T4x8_mobilenet_Offline [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result. [2019-12-25 13:09:29,119 builder.py:119 INFO] Building ./build/engines/T4x8/mobilenet/Offline/mobilenet-Offline-gpu-b128-int8.plan [TensorRT] INFO: [EXPLICIT_PRECISION] Setting tensor scales of all tensors of explicit precision network to 1.0f [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) [TensorRT] ERROR: ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.) Process Process-3: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/work/code/main.py", line 102, in handle_generate_engine b.build_engines() File "/work/code/common/builder.py", line 124, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' Traceback (most recent call last): File "code/main.py", line 327, in
main() File "code/main.py", line 317, in main launch_handle_generate_engine(benchmark_name, benchmark_conf, need_gpu, need_dla) File "code/main.py", line 80, in launch_handle_generate_engine raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! make: *** [generate_engines] Error 1 Makefile:298: recipe for target 'generate_engines' failed
I guess it may be caused by the config file... How to change the T4x8 config file to T4x1? I tried to change the "gpu_offline_expected_qps" in config.json but it didn't work.
Please find the solution here: https://github.com/mlperf/inference_results_v0.5/issues/7#issuecomment-561804070
This is caused by the CUDA-10.2 release, which TRT6 does not support.
Still the same problem. CUDA version is 10.1. I also tried https://github.com/mlperf/inference_results_v0.5/issues/7#issuecomment-559206390 ,but it didn't work.
@GaryYuyjl Could you make sure you re-build the docker image after making that change?
To verify if the change takes effect, please run ls /usr/lib/x86_64-linux-gnu/libcublas* in container to check cuBLAS version. The correct one should be libcublas.so.10.2.1.243, whereas the original codes will install libcublas.so.10.2.2.88, which is too new for TRT6.
Thank you. I forgot to run make build_docker.
But I found another problem after I reran it:
$ docker exec mlperf-inference-ubuntu bash -c 'make generate_engines RUN_ARGS="--benchmarks=resnet --test_mode=PerformanceOnly --scenarios=Offline --config=/work/measurements/T4x8/resnet/Offline/config.json"'
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/work/code/main.py", line 101, in handle_generate_engine
b = get_benchmark(benchmark_name, config)
File "/work/code/main.py", line 33, in get_benchmark
ResNet50 = import_module("code.resnet.tensorrt.ResNet50").ResNet50
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/work/code/resnet/tensorrt/ResNet50.py", line 23, in <module>
RN50Calibrator = import_module("code.resnet.tensorrt.calibrator").RN50Calibrator
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/work/code/resnet/tensorrt/calibrator.py", line 19, in <module>
import pycuda.driver as cuda
File "/usr/local/lib/python3.6/dist-packages/pycuda/driver.py", line 5, in <module>
from pycuda._driver import * # noqa
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
@GaryYuyjl Please make sure that you have https://github.com/NVIDIA/nvidia-docker properly installed. Please run with docker run --gpus=all ... (or nvidia-docker run ... if you installed nvidia-docker2 package) so that the docker container can "see" the GPUs.
Thank you. Here is the results: Resnet Offline: "Accuracy = 76.034, Threshold = 75.695. Accuracy test PASSED." "Samples per second: 4247.88 and Result is : VALID"
Server: "Accuracy = 76.034, Threshold = 75.695. Accuracy test PASSED." "Scheduled samples per second : 41546.64 and Result is : INVALID"
@GaryYuyjl It does seem that your T4 runs more slowly. ResNet Offline should give ~5.5k infer/s.
Could you run nvidia-smi dmon -s pc in parallel to track the temperature and the clock frequencies? If you see the temperature goes up above, say, 70C, then there might be some issue with the cooling. T4 has passive cooling, so it is sensitive to cooling efficiency.
I think that's the problem. The highest temperature rises to about 80C and the pclk drops down to about 720MHZ.
@nvpohanh Just curious what kind of cooling method do you use for T4.
I think we were just doing the same air-cooling as normal server rooms.