Louis Fortier-Dubois
Louis Fortier-Dubois
The reduction of dimension is not fully parallelized, since one thread reduces a whole dimension alone with a for loop. To improve the performance of shaders, we should use a...
In autodiff, we should have a checkpointing strategy for better memory consumption (see for instance [https://www-sop.inria.fr/tropics/papers/DauvergneHascoet06.pdf](https://www-sop.inria.fr/tropics/papers/DauvergneHascoet06.pdf)) . Currently, for most operations run in the forward pass, a state will be...
Dirty PR here, not meant to merge @nathanielsimard @antimora I benchmarked linear without (Linear) and with (LinearT) the new weight order change. It appears to be in general quite slower...
In autotune, we create random tensors as inputs. Since random only works with floats it makes it impossible to use autotune with int and bool. Lazy solution: revert to creating...
In burn-train, several metrics can be used during training. It would be great to have more! - [X] Accuracy - [X] Loss (the one in use) - [X] CUDA utilization...
The operation `slice_assign` is very important for us in backward and we still don't have it with the Candle backend. There is an [issue open on the Candle GitHub](https://github.com/huggingface/candle/issues/1351). If...
Surprisingly, convolutions are already rather fast (especially transposed convolutions), but they probably can be improved further using shared/local memory, leveraging memory coalescing. - [ ] Convolution 2D - [ ]...
At the moment, the autotune mechanism is not able to support no_std [because we use Instant](https://durch.github.io/rust-goauth/time/index.html) We must do an analysis to see if we could use something else to...
For Matmul in WGPU, we use autotune with a key that tells us implicitly the range in which the inputs of matmul are. For instance, if we have [3, 2,...
Use the autotune mechanism in the WGPU backend to find the fastest kernel version of binary_elemewise and its inplace counterpart, by varying the WORKGROUP argument. Take inspiration from the `burn-wgpu/src/kernel/matmul/tune`...