dfdx
dfdx copied to clipboard
Convolution mega issue
A somewhat clear implementation using im2col in darknet: https://github.com/pjreddie/darknet/blob/master/src/convolutional_layer.c#L445
Another here https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Im2Col.cpp
Since doing expressions with const generics isn't stable, the first thought of how to implement this won't work.
tracking issue: https://github.com/rust-lang/rust/issues/76560
#[derive(Clone, Debug, Default)]
pub struct Conv2D<
const IN_CHANNELS: usize,
const OUT_CHANNELS: usize,
const KERNEL_SIZE: usize,
const STRIDE: usize = 1,
const PADDING: usize = 0,
> {
weight: Tensor4D<OUT_CHANNELS, IN_CHANNELS, KERNEL_SIZE, KERNEL_SIZE>,
bias: Tensor1D<OUT_CHANNELS>,
}
impl<
const IN_CHANNELS: usize,
const OUT_CHANNELS: usize,
const KERNEL_SIZE: usize,
const STRIDE: usize,
const PADDING: usize,
const WIDTH: usize,
const HEIGHT: usize,
> Module<Tensor3D<IN_CHANNELS, HEIGHT, WIDTH>>
for Conv2D<IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE, STRIDE, PADDING>
{
type Output = Tensor3D<
OUT_CHANNELS,
{ (HEIGHT + 2 * PADDING - KERNEL_SIZE) / STRIDE + 1 }, // This doesn't compile
{ (WIDTH + 2 * PADDING - KERNEL_SIZE) / STRIDE + 1 }, // This doesn't compile
>;
fn forward(&self, input: Tensor3D<IN_CHANNELS, HEIGHT, WIDTH>) -> Self::Output {
todo!();
}
}
Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)
im2col is also going to require pre-specifying the size until const generic expressions become stable:
fn im2col<
const IN_CHANNELS: usize,
const IN_HEIGHT: usize,
const IN_WIDTH: usize,
const OUT_CHANNELS: usize,
const OUT_HEIGHT: usize,
const OUT_WIDTH: usize,
const KERNEL_SIZE: usize,
const STRIDE: usize,
const PADDING: usize,
>(
im: Tensor3D<IN_CHANNELS, IN_HEIGHT, IN_WIDTH>,
) -> Tensor2D<{ IN_CHANNELS * KERNEL_SIZE * KERNEL_SIZE }, { OUT_HEIGHT * OUT_WIDTH }> {
let mut output = Tensor2D::zeros();
output
}
Am going to work on this for nightly compilers only in the branch 1-conv-nightly branch
To read later: https://sahnimanas.github.io/post/anatomy-of-a-high-performance-convolution/
A rust crate to check out: https://github.com/Conzel/convolutions-rs
Downside of convolution-rs
is that it uses ndarray and probably ends up allocating (which is why the reported benchmarks are so slow.
My current plan is to implement conv2d
, im2col
, and col2im
using slices, so no const generics. Then the Conv2D layer will be a thin wrapper around that with just the input & output sizes using const generics.
Here is the initial version of conv2d function. Still need to test it more:
pub fn conv2d(inp: &[f32], weight: &[f32], bias: &[f32], out: &mut [f32], cfg: ConvConfig) {
let m = cfg.channels_out;
let k = cfg.channels_in * cfg.kernel_size * cfg.kernel_size;
let n = cfg.height_out() * cfg.width_out();
// weight: (channels_out, channels_in * kernel_size * kernel_size)
// col: (channels_in * kernel_size * kernel_size, height_out * width_out)
// out: (channels_out, height_out * width_out)
assert_eq!(inp.len(), cfg.channels_in * cfg.height_in * cfg.width_in);
assert_eq!(weight.len(), cfg.channels_out * k);
assert_eq!(bias.len(), cfg.channels_out);
assert_eq!(out.len(), cfg.channels_out * n);
let mut col = cfg.allocate_col::<f32>();
im2col(inp, col.as_mut(), cfg);
for i in 0..cfg.channels_out {
out[i * n..(i + 1) * n].fill(bias[i]);
}
unsafe {
matrixmultiply::sgemm(...)
}
}
I think it could also be nice to have Conv2D allocate space for the col space instead of reallocating every forward call.
Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)
would it be possible to create a proc_macro that takes the generic parameters as an input and returns a compile error if the parameters are invalid or returns nothing if the parameters are ok? I'll try to create such a crate
Another option is to make in/out height/width part of the Conv2D specification. It's a tooooon of generic parameters though... Also unclear how to verify that all the parameters are valid (e.g. applying the kernel size/stride/padding to in height/width gives you out height/width)
would it be possible to create a proc_macro that takes the generic parameters as an input and returns a compile error if the parameters are invalid or returns nothing if the parameters are ok? I'll try to create such a crate
ok, that was a dumb idea because all the proc macro receives is the generic parameter, but not its value :/
Downside of
convolution-rs
is that it uses ndarray and probably ends up allocating (which is why the reported benchmarks are so slow.My current plan is to implement
conv2d
,im2col
, andcol2im
using slices, so no const generics. Then the Conv2D layer will be a thin wrapper around that with just the input & output sizes using const generics.Here is the initial version of conv2d function. Still need to test it more:
pub fn conv2d(inp: &[f32], weight: &[f32], bias: &[f32], out: &mut [f32], cfg: ConvConfig) { let m = cfg.channels_out; let k = cfg.channels_in * cfg.kernel_size * cfg.kernel_size; let n = cfg.height_out() * cfg.width_out(); // weight: (channels_out, channels_in * kernel_size * kernel_size) // col: (channels_in * kernel_size * kernel_size, height_out * width_out) // out: (channels_out, height_out * width_out) assert_eq!(inp.len(), cfg.channels_in * cfg.height_in * cfg.width_in); assert_eq!(weight.len(), cfg.channels_out * k); assert_eq!(bias.len(), cfg.channels_out); assert_eq!(out.len(), cfg.channels_out * n); let mut col = cfg.allocate_col::<f32>(); im2col(inp, col.as_mut(), cfg); for i in 0..cfg.channels_out { out[i * n..(i + 1) * n].fill(bias[i]); } unsafe { matrixmultiply::sgemm(...) } }
I think it could also be nice to have Conv2D allocate space for the col space instead of reallocating every forward call.
this sounds as a good solution to me, though I'd probably make this function private and call it with another function that does the assertions (since no checks are required with generic_const_expr).
Currently looking around for a good reference implementation for backward operation. I've gathered that "conv transpose" is part of it (both from pytorch ConvTranspose2d documentation, and convolution-rs also includes an impl for conv transpose that mentions backward).
Edit: I think the pytorch backward is this? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L1520
Currently looking around for a good reference implementation for backward operation. I've gathered that "conv transpose" is part of it (both from pytorch ConvTranspose2d documentation, and convolution-rs also includes an impl for conv transpose that mentions backward).
Edit: I think the pytorch backward is this? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L1520
you can have a look at this one: https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c
Other resources:
- https://arxiv.org/abs/1603.07285
- https://gist.github.com/yxlao/ef50416011b9587835ac752aa3ce3530
- deeplearning.cs.cmu.edu/S21/document/slides/Lec12.CNN4.pdf