Anton Smirnov comments

Results 213 comments of


                                            Anton Smirnov

Introduce `AsyncNumber` to lazily copy numeric `mapreduce` results to the host

Thanks!

GPU kernels for optimizers

The kernel in NerfUtils.jl fuses several operations into a single kernel, while Optimisers split it up into 4 (if counting actual parameter update). For smaller arrays the benefit is negligible,...

Opaque pointers support

> Generally we won't be able to read LLVM IR being produced by a newer version of LLVM. I've tried a simpler case with a kernel without any arguments to...

ROCM-Aware MPI requires AMDGPU.synchronize()

Just for context, @Alexander-Barth said that with Lux it works fine without manually synchronizing, so looks like something is missing here in Flux.

Lower-level kernel form?

> The 285% performance regression we experienced with the 0.9.34 semantics change was significant, and it would be great if we could avoid similar impacts in the future. Haven't looked...

Lower-level kernel form?

I saw also that the kernel now uses `malloc` intrinsic (thus spawning a hostcall). Do you know why it is so? And why not every kernel now does this (that...

amd gpu give different results when nested loop is used

This seems to appear only when using `NTuple` index type. Changing to linear or cartesian works fine: ``` @kernel function kernel_xx!(tensor, Nx::Int64, Ny::Int64, Nz::Int64) i = @index(Global) s = zero(eltype(tensor))...

amd gpu give different results when nested loop is used

Or more generally, when passing `size(x)` to `ndrage` instead of `length(x)`.

amd gpu give different results when nested loop is used

Here's optimized LLVM IR for: ```julia @kernel function kernel_xx!(tensor, Nx::Int64, Ny::Int64, Nz::Int64) idx = @index(Global) res = zero(eltype(tensor)) for p in (-Nx):Nx for q in (-Ny):Ny res += 2.0 end...

amd gpu give different results when nested loop is used

> In my computer, the linear or cartesian also give the wrong results: Did you change how you launch the code? You need to specity `ndrange=length(x)` instead of `ndrange=size(x)`. Because...