dfdx Convolution mega issue

May 07 '22 13:05 coreylowman

A somewhat clear implementation using im2col in darknet: https://github.com/pjreddie/darknet/blob/master/src/convolutional_layer.c#L445

May 28 '22 21:05 coreylowman

Another here https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Im2Col.cpp

May 29 '22 15:05 coreylowman

Since doing expressions with const generics isn't stable, the first thought of how to implement this won't work.

tracking issue: https://github.com/rust-lang/rust/issues/76560

#[derive(Clone, Debug, Default)]
pub struct Conv2D<
    const IN_CHANNELS: usize,
    const OUT_CHANNELS: usize,
    const KERNEL_SIZE: usize,
    const STRIDE: usize = 1,
    const PADDING: usize = 0,
> {
    weight: Tensor4D<OUT_CHANNELS, IN_CHANNELS, KERNEL_SIZE, KERNEL_SIZE>,
    bias: Tensor1D<OUT_CHANNELS>,
}

impl<
        const IN_CHANNELS: usize,
        const OUT_CHANNELS: usize,
        const KERNEL_SIZE: usize,
        const STRIDE: usize,
        const PADDING: usize,
        const WIDTH: usize,
        const HEIGHT: usize,
    > Module<Tensor3D<IN_CHANNELS, HEIGHT, WIDTH>>
    for Conv2D<IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE, STRIDE, PADDING>
{
    type Output = Tensor3D<
        OUT_CHANNELS,
        { (HEIGHT + 2 * PADDING - KERNEL_SIZE) / STRIDE + 1 }, // This doesn't compile
        { (WIDTH + 2 * PADDING - KERNEL_SIZE) / STRIDE + 1 }, // This doesn't compile
    >;
    fn forward(&self, input: Tensor3D<IN_CHANNELS, HEIGHT, WIDTH>) -> Self::Output {
        todo!();
    }
}

May 29 '22 16:05 coreylowman

Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)

May 29 '22 16:05 coreylowman

im2col is also going to require pre-specifying the size until const generic expressions become stable:

fn im2col<
    const IN_CHANNELS: usize,
    const IN_HEIGHT: usize,
    const IN_WIDTH: usize,
    const OUT_CHANNELS: usize,
    const OUT_HEIGHT: usize,
    const OUT_WIDTH: usize,
    const KERNEL_SIZE: usize,
    const STRIDE: usize,
    const PADDING: usize,
>(
    im: Tensor3D<IN_CHANNELS, IN_HEIGHT, IN_WIDTH>,
) -> Tensor2D<{ IN_CHANNELS * KERNEL_SIZE * KERNEL_SIZE }, { OUT_HEIGHT * OUT_WIDTH }> {
    let mut output = Tensor2D::zeros();
    output
}

May 30 '22 15:05 coreylowman

Am going to work on this for nightly compilers only in the branch 1-conv-nightly branch

May 31 '22 20:05 coreylowman

To read later: https://sahnimanas.github.io/post/anatomy-of-a-high-performance-convolution/

Jul 14 '22 22:07 coreylowman

A rust crate to check out: https://github.com/Conzel/convolutions-rs

Jul 14 '22 22:07 coreylowman

Downside of convolution-rs is that it uses ndarray and probably ends up allocating (which is why the reported benchmarks are so slow.

My current plan is to implement conv2d, im2col, and col2im using slices, so no const generics. Then the Conv2D layer will be a thin wrapper around that with just the input & output sizes using const generics.

Here is the initial version of conv2d function. Still need to test it more:


pub fn conv2d(inp: &[f32], weight: &[f32], bias: &[f32], out: &mut [f32], cfg: ConvConfig) {
    let m = cfg.channels_out;
    let k = cfg.channels_in * cfg.kernel_size * cfg.kernel_size;
    let n = cfg.height_out() * cfg.width_out();

    // weight: (channels_out, channels_in * kernel_size * kernel_size)
    // col: (channels_in * kernel_size * kernel_size, height_out * width_out)
    // out: (channels_out, height_out * width_out)

    assert_eq!(inp.len(), cfg.channels_in * cfg.height_in * cfg.width_in);
    assert_eq!(weight.len(), cfg.channels_out * k);
    assert_eq!(bias.len(), cfg.channels_out);
    assert_eq!(out.len(), cfg.channels_out * n);

    let mut col = cfg.allocate_col::<f32>();
    im2col(inp, col.as_mut(), cfg);

    for i in 0..cfg.channels_out {
        out[i * n..(i + 1) * n].fill(bias[i]);
    }

    unsafe {
        matrixmultiply::sgemm(...)
    }
}

I think it could also be nice to have Conv2D allocate space for the col space instead of reallocating every forward call.

Jul 15 '22 15:07 coreylowman

Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)

would it be possible to create a proc_macro that takes the generic parameters as an input and returns a compile error if the parameters are invalid or returns nothing if the parameters are ok? I'll try to create such a crate

Jul 20 '22 10:07 M1ngXU

Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)

would it be possible to create a proc_macro that takes the generic parameters as an input and returns a compile error if the parameters are invalid or returns nothing if the parameters are ok? I'll try to create such a crate

ok, that was a dumb idea because all the proc macro receives is the generic parameter, but not its value :/

Jul 20 '22 15:07 M1ngXU

Downside of convolution-rs is that it uses ndarray and probably ends up allocating (which is why the reported benchmarks are so slow.

My current plan is to implement conv2d, im2col, and col2im using slices, so no const generics. Then the Conv2D layer will be a thin wrapper around that with just the input & output sizes using const generics.

Here is the initial version of conv2d function. Still need to test it more:
pub fn conv2d(inp: &[f32], weight: &[f32], bias: &[f32], out: &mut [f32], cfg: ConvConfig) {
    let m = cfg.channels_out;
    let k = cfg.channels_in * cfg.kernel_size * cfg.kernel_size;
    let n = cfg.height_out() * cfg.width_out();

    // weight: (channels_out, channels_in * kernel_size * kernel_size)
    // col: (channels_in * kernel_size * kernel_size, height_out * width_out)
    // out: (channels_out, height_out * width_out)

    assert_eq!(inp.len(), cfg.channels_in * cfg.height_in * cfg.width_in);
    assert_eq!(weight.len(), cfg.channels_out * k);
    assert_eq!(bias.len(), cfg.channels_out);
    assert_eq!(out.len(), cfg.channels_out * n);

    let mut col = cfg.allocate_col::<f32>();
    im2col(inp, col.as_mut(), cfg);

    for i in 0..cfg.channels_out {
        out[i * n..(i + 1) * n].fill(bias[i]);
    }

    unsafe {
        matrixmultiply::sgemm(...)
    }
}
I think it could also be nice to have Conv2D allocate space for the col space instead of reallocating every forward call.

this sounds as a good solution to me, though I'd probably make this function private and call it with another function that does the assertions (since no checks are required with generic_const_expr).

Jul 20 '22 16:07 M1ngXU

Currently looking around for a good reference implementation for backward operation. I've gathered that "conv transpose" is part of it (both from pytorch ConvTranspose2d documentation, and convolution-rs also includes an impl for conv transpose that mentions backward).

Edit: I think the pytorch backward is this? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L1520

Jul 21 '22 22:07 coreylowman

Currently looking around for a good reference implementation for backward operation. I've gathered that "conv transpose" is part of it (both from pytorch ConvTranspose2d documentation, and convolution-rs also includes an impl for conv transpose that mentions backward).

Edit: I think the pytorch backward is this? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L1520

you can have a look at this one: https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c

Jul 22 '22 10:07 M1ngXU

Other resources:

https://arxiv.org/abs/1603.07285
https://gist.github.com/yxlao/ef50416011b9587835ac752aa3ce3530
deeplearning.cs.cmu.edu/S21/document/slides/Lec12.CNN4.pdf

Jul 26 '22 12:07 coreylowman

dfdx dfdx copied to clipboard

Convolution mega issue

dfdx
dfdx copied to clipboard