benchmarks
benchmarks copied to clipboard
Compiling options for tensorflow
Hi, I followed this article and reproduce the throughput that it made. However, when I try to compile the tensorflow by myself, I cannot achieve the throughput that article did.
I wonder what compiler version and options does this prebuild package use?
I tried to pass the bazel "-march=native -O3", it improved but still less than using your package.
Look forward to your reply.
Thanks
Here is what I used:
Compile: I do it manually so I just answer the questions. All defaults except do the following:
- CUDA 10 and cuDNN 7.3.1 (I have seen some regression with cuDNN 7.4 that are fixed at head and I am testing today that could improve performance by another 10% maybe)
- XLA yes (default in TF 1.12)
- NCCL 2.3.5
- you can include TensorRT but it doesn't matter for the ResNet test
- compute 7.0 (or whatever you need/want)
# I build with haswell which gives AVX2 support and I am
# too lazy to ensure I type out all of the various flags I want.
# use I think ivybridge if you want AVX. If your GCC is older
# it may not support the haswell alias.
bazel build -c opt --copt=-march="broadwell" //tensorflow/tools/pip_package:build_pip_package
# Make the .whl
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg