Anton Smirnov

Results 213 comments of Anton Smirnov

The kernel in NerfUtils.jl fuses several operations into a single kernel, while Optimisers split it up into 4 (if counting actual parameter update). For smaller arrays the benefit is negligible,...

> Generally we won't be able to read LLVM IR being produced by a newer version of LLVM. I've tried a simpler case with a kernel without any arguments to...

Just for context, @Alexander-Barth said that with Lux it works fine without manually synchronizing, so looks like something is missing here in Flux.

> The 285% performance regression we experienced with the 0.9.34 semantics change was significant, and it would be great if we could avoid similar impacts in the future. Haven't looked...

I saw also that the kernel now uses `malloc` intrinsic (thus spawning a hostcall). Do you know why it is so? And why not every kernel now does this (that...

This seems to appear only when using `NTuple` index type. Changing to linear or cartesian works fine: ``` @kernel function kernel_xx!(tensor, Nx::Int64, Ny::Int64, Nz::Int64) i = @index(Global) s = zero(eltype(tensor))...

Or more generally, when passing `size(x)` to `ndrage` instead of `length(x)`.

Here's optimized LLVM IR for: ```julia @kernel function kernel_xx!(tensor, Nx::Int64, Ny::Int64, Nz::Int64) idx = @index(Global) res = zero(eltype(tensor)) for p in (-Nx):Nx for q in (-Ny):Ny res += 2.0 end...

> In my computer, the linear or cartesian also give the wrong results: Did you change how you launch the code? You need to specity `ndrange=length(x)` instead of `ndrange=size(x)`. Because...