Louis Fortier-Dubois issues

Results 14 issues of


                                            Louis Fortier-Dubois

Performance WGPU: Improve Reduce kernels

The reduction of dimension is not fully parallelized, since one thread reduces a whole dimension alone with a for loop. To improve the performance of shaders, we should use a...

performance

wgpu

Autodiff: checkpointing strategy

In autodiff, we should have a checkpointing strategy for better memory consumption (see for instance [https://www-sop.inria.fr/tropics/papers/DauvergneHascoet06.pdf](https://www-sop.inria.fr/tropics/papers/DauvergneHascoet06.pdf)) . Currently, for most operations run in the forward pass, a state will be...

performance

very hard

[draft] linear_change_weight_dim_order

Dirty PR here, not meant to merge @nathanielsimard @antimora I benchmarked linear without (Linear) and with (LinearT) the new weight order change. It appears to be in general quite slower...

Autotune: support int/bool ops

In autotune, we create random tensors as inputs. Since random only works with floats it makes it impossible to use autotune with int and bool. Lazy solution: revert to creating...

enhancement

Training metrics

In burn-train, several metrics can be used during training. It would be great to have more! - [X] Accuracy - [X] Loss (the one in use) - [X] CUDA utilization...

good first issue

feature

Candle: support for slice assign

The operation `slice_assign` is very important for us in backward and we still don't have it with the Candle backend. There is an [issue open on the Candle GitHub](https://github.com/huggingface/candle/issues/1351). If...

enhancement

Performance WGPU: Improve Convolution/Pooling kernels

Surprisingly, convolutions are already rather fast (especially transposed convolutions), but they probably can be improved further using shared/local memory, leveraging memory coalescing. - [ ] Convolution 2D - [ ]...

performance

wgpu

Louis Fortier-Dubois

Performance WGPU: Improve Reduce kernels

Autodiff: checkpointing strategy

[draft] linear_change_weight_dim_order

Autotune: support int/bool ops

Training metrics

Candle: support for slice assign

Performance WGPU: Improve Convolution/Pooling kernels

Autotune: support no-std

Autotune: Inputs more representative of the key

Autotune WGPU: binary_elemwise kernel