Valentin Churavy

Results 1415 comments of Valentin Churavy

In contrast with constant a ndrange: ``` │┌ @ /srv/scratch/lraess/julia_depot/packages/KernelAbstractions/zPAn3/src/nditeration.jl:84 within `expand` ; ││┌ @ abstractarray.jl:1291 within `getindex` ; │││┌ @ abstractarray.jl:1336 within `_getindex` ; ││││┌ @ abstractarray.jl:1343 within `_to_subscript_indices`...

I am not sure right now. 1. We could special case 1D/2D/3D NDRanges 2. Maybe https://github.com/maleadt/StaticCartesian.jl would help, but in this case we don't have a static set of cartesian...

`@synchronize` sadly doesn't work on the CPU within arbitrary control-flow see #330

``` for i in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] if li + i

> (call to ijl_alloc_array_1d) ``` for i in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] ``` Use a tuple `()` instead

Hm on the GPU you shouldn't get inconsistent results, on the CPU yes.

So the only thing that seems fishy to me is. ``` while (li + s)

You are trying to do something similar to https://github.com/JuliaGPU/CUDA.jl/blob/cb14a637e0b7b7be9ae01005ea9bdcf79b320189/src/mapreduce.jl#L62 IIUC? My long term goal is to provide primitives for these kinds of block reduces #234 So the difference I see...

Are you back to testing on the CPU? I should add a pass that errors when synchronize is used inside control flow, KA current design simply can't support that.

Are you not missing a `+1` in `lhsIndex = 2 * d * (li - 1)` For `li=1` `lhsindex` will be 0 which is OOB