Valentin Churavy
Valentin Churavy
Yeah we would need a scratch space that doesn't capture the array on a bounds-error.
@MasonProtter has probably the cleaner implementation https://github.com/tshort/StaticCompiler.jl/blob/master/src/pointer_patching.jl; Enzyme does it for a very different purpose. We would need to save the runtime, along-side the relocation table and then when we...
Can you post a Project.toml + Manifest.toml?
Grrr of course there is now rocprofv3 which does look like an improvement https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/index.html but now the generation fails for me at least.
👍 these functions are post-hoc additions to attempt to allow these queries without breaking compatibility. So we could add a more fine-grained interface that uses types as well.
Yeah, I think the granularity: `Datatype` ought to suffice and indeed I would say that Metal does not support 64bit semantics. We could, of course, make it much more fine-grained...
After speaking with @maleadt in Lausanne and we decided that we might be able to permit Symbols being passed to the GPU https://github.com/JuliaGPU/GPUCompiler.jl/pull/650
The `fastmath` macro is sadly not region based like inbounds, but rather statement based... ``` @macroexpand @fastmath 1 + 1 ``` ``` :(Base.FastMath.add_fast(1, 1)) ``` So you would need to...
So `@cuda fastmath=true` is a weird beast. > - `fastmath`: use less precise square roots and flush denormals It actually doesn't set fastmath on the IR level, but as a...
`nvcc --use_fast_math` is ill-defined. It is both a language level semantic change and a backend compiler change. I think we can straight-forwardly expose the compiler semantics, but matching the language...