Conv3d
Proposed changes
- Implementation of naive
slow_conv_3Don the cpu. - Implementation of
explicit_gemm_conv_ND_cpu.- However, using this seems to be considerably slower than the naive implementation. I guess that materializing the strided input view takes very long. The actual gemm is quite fast. Therefore, this is currently unused in conv3d routing.
- Usage of
explicit_gemm_conv_ND_gpufor GPU implementation ofconv3d. - Added tests for
conv3d. - Fixed handling of negative padding. The old behavior led to incorrect shapes of the conv gradient w.r.t. the input in some specific kernel size/padding/dilation combinations.
- I also fixed two typos in some error messages I spotted in the code.
Checklist
- [x] I have read the CONTRIBUTING document
- [x] I have run
pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes - [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have updated the necessary documentation (if needed)
Implementation of explicit_gemm_conv_ND_cpu. However, using this seems to be considerably slower than the naive implementation. I guess that materializing the strided input view takes very long. The actual gemm is quite fast. Therefore, this is currently unused in conv3d routing.
Should we delete it? If it's not fast and has no hope of being fast, I don't see much point to leaving it in.
@mlaves are you planning to update this?
@mlaves are you planning to update this?
Yes, I will update my PR soon.
Implementation of explicit_gemm_conv_ND_cpu. However, using this seems to be considerably slower than the naive implementation. I guess that materializing the strided input view takes very long. The actual gemm is quite fast. Therefore, this is currently unused in conv3d routing.
Should we delete it? If it's not fast and has no hope of being fast, I don't see much point to leaving it in.
The gemm operation itself is fast, but the reshaping copy(in_strided_view, in_strided, CopyType::General) is slow. PyTorch's conv3d CPU implementation is 10x faster and also uses gemm-conv. Should I remove this now or is there any chance the reshaping will be faster in the future?
Should I remove this now or is there any chance the reshaping will be faster in the future?
It's possible, we haven't looked at / optimized the CPU copies in a while.
Should I remove this now or is there any chance the reshaping will be faster in the future?
It's possible, we haven't looked at / optimized the CPU copies in a while.
In this case, I would keep explicit_gemm_conv_ND_cpu, even though it's unused now.
Sounds good to me!
I incorporated all reviewer suggestions.
I rebased onto main, CI should pass now.