Valentin Churavy
Valentin Churavy
Hm yeah, I was thinking `current_backend()` but Simon wanted the ability to not use it on the top-level. We could pre-empt some other work I am thinking about and expose...
We should also finish #320
I am unsure what KernelAbstractions could do here. This sounds like a fault in the debugger.
Hm I need to think through the semantics of while loops on the CPU... #262 You can use `@macroexpand` to debug the lowering of the kernel. You should see two...
So one thing to think through is what a `while` loop with a `@synchronize` inside should look like.
Yeah the `@synchronize` makes while loops hard.... ``` s = MVector(Int, length(wkgrp)) mask = map(s->s>0, s) while any(mask) for tid in wkgrp mask[tid] || continue if tid < s[tid] cache[ti]...
We could solve break through introducing a mask... I like this direction, but it is something that the current architecture doesn't easily support.
I often say: The choice is up to the user. Experience has shown that having GPU backends as dependencies can cause issues, when one backend is quicker to update than...
@luraess also mentioned that it would make sense to configure the hardware dimension index into the Kernel struct.
The maximum linear index with `UInt32` is 4,294,967,295 so an array of about 4GB. With GPUs having upwards of 40GB or more memory in the data canter, it's not unlikely...