swapping-autoencoder-pytorch
swapping-autoencoder-pytorch copied to clipboard
Mixed precision training?
I'm trying to add mixed precision training support. I'm newbie at this~
What I have figured out so far is the upfirdn2d
& fused
modules are compiled at runtime.
I figured out that the compiler is called by this line
which creates a build.ninja
file like below:
cuda_cflags = -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\"
-DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ceyda/.local/lib/python3.8/site-packages/torch/include -isystem /home/ceyda/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem
/home/ceyda/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/ceyda/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8
-D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -std=c++14
more importantly it adds these flags:
-D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__
which I'm assuming disables mixed precision for these modules (?not sure) The question is how can I remove those flags& enable mixed precision? Probably also need to upgrade to cuda 11?
Looks like those flags don't mean no half precision, but rather means use torch half ops instead of cuda's. ref:https://discuss.pytorch.org/t/cuda-no-half2-operators-for-cuda-9-2/18365/4 Anyway, I think I managed to get it to work...not sure how much efficiency it adds haven't benchmarked yet. I will clean up and open a PR later.