returnn
returnn copied to clipboard
Optimal dim order for convolution - NHWC for some newer GPUs?
In the Volta Tensor Core GPU Achieves New AI Performance Milestones (2018) blog post by Nvidia, it is said that NHWC performs best.
It sounds like this statement is specifically for the Tensor Core GPU architecture built into Volta GPUs. I assume this is then also true for successors, i.e. Ampere GPUs. This would include the GeForce 30 series (3080, 3090 etc). Although the 20 series (2080 etc) also has Tensor Cores (second generation) but I'm not really sure if it applies to them as well.
I'm also not sure if this is then also true for TensorFlow, or if we need anything special on TensorFlow. Or maybe a recent TensorFlow version + recent CUDA version.
In any case, the automatic selection of optimal dim order in ConvLayer
, PoolLayer
and others, which currently just depends on GPU vs CPU should probably be extended to take this into account.
@curufinwe maybe you know some more about this? Or @JackTemaki? Or who else could know more?
Some research should be made on this, and also some benchmarking.