taichi
taichi copied to clipboard
Support the warp-level primitive for ``f64` and `f16`
Currently, we only see the support for warp-level primitives for 32-bit data, such as ti.simt.warp.shfl_sync_f32, ti.simt.warp.shfl_up_f32, ti.simt.warp.shfl_down_f32. These primitives cannot be applied to f64. We are expecting support for float64 primitives.