model-zoo icon indicating copy to clipboard operation
model-zoo copied to clipboard

Many examples use `HWCN` order instead of `WHCN` recommended by document

Open yiyuezhuo opened this issue 4 years ago • 2 comments

According to doc

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

But many examples use HWCN, for example:

# Function to convert the RGB image to Float64 Arrays
function getarray(X)
    Float32.(permutedims(channelview(X), (2, 3, 1)))
end

The correct transform should be Float32.(permutedims(channelview(X), (3, 2, 1))), because channelview(X) returns a "CHW" array. Likewise, MNIST example doesn't use any permutedims so it just keeps its wrong "HW" order. Fortunately, some are correct, for example, this one from MLDatasets. The three cases are shown in this gist.

So basically, many examples in fact run models on a "transposed" image datasets. The good (bad?) part is that CNN is robust enough to deal with this distortion, so we can't detect it from statistics such as acc and even eye evaluation. But those examples suggest a misleading preprocessing pipeline and should be fixed. (To be honest, I post this issue since I have been misleaded by this...)

yiyuezhuo avatar Jun 07 '20 03:06 yiyuezhuo

Fixed in #306

aditkumar72 avatar Jun 18 '21 20:06 aditkumar72

I think FluxML shall also document why we use the WHCN order. The only explanation I have is that CUDA.jl is using cuDNN and cuDNN supports the NCHW order (row-major). If we look at the NCHW memory representation in cuDNN, we can notice this is exactly the same as the memory representation of WHCN in julia.

cuDNN: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#nchw-layout-x32 image

image

MariusDrulea avatar Feb 18 '23 15:02 MariusDrulea