burn
burn copied to clipboard
Crash on Chrome WebGPU for kernels that bind with aliasing.
Describe the bug
If some operations use a tensor multiple times, Chrome can crash complaining it's a violation of the WebGPU specification to bind aliased buffers.
To Reproduce
let tensor = Tensor::<Backend, 2>::zeros([2, 1], &device);
let test: Tensor<Backend, 2> = tensor.clone().mul(tensor).sum_dim(1); // fails!
// for some reason works if tensor is 1D? Or if not summing / presumable some other operation?
// It might be slightly more involved than I thought!
// The following does work:
// let test: Tensor<Backend, 2> = tensor.clone().powf_scalar(2.0).sum_dim(1);
Run this code on WebGPU on Chrome (firefox nightly seems fine with it - though I've had a host of other issues on firefox so generally it seems less complete).
The output looks like:
Writable storage buffer binding aliasing found between [BindGroup (unlabeled)] set at bind group index 0, binding index 0, and [BindGroup (unlabeled)] set at bind group index 0, binding index 1, with overlapping ranges (offset: 0, size: 8) and (offset: 0, size: 8) in [Buffer (unlabeled)].
- While encoding [ComputePassEncoder (unlabeled)].DispatchWorkgroups(1, 1, 1).
127.0.0.1/:1 [Invalid CommandBuffer "Command Encoder" from CommandEncoder "Command Encoder"] is invalid.
- While calling [Queue].Submit([[Invalid CommandBuffer "Command Encoder" from CommandEncoder "Command Encoder"]])
Expected behavior Don't crash :) How to achieve that seems trickier. You'd need different WGSL versions of the kernel depending on whether inputs/outputs are the same or not. CubeCL might be able to handle that, but I'm not entirey sure what 's best here!
CC @louisfd @nathanielsimard
Not sure if there is an option to disable that check, if not we will have to compile more kernels automatically based on handle ids I guess.
@ArthurBrussee checking back. Do we still have this issue?
No think this is fixed!