[QUESTION] How to load a data tile according to the given index tile?
Bug Description
Could you help me compile this kernel successfully? Thanks!
import numpy as np
import warp
TILE_SIZE = warp.constant(128)
@warp.func
def div_kernel(x: float, y: float) -> float:
return x / y
@warp.kernel
def kernel(
diags_ref: warp.array2d(dtype=float),
solves_ref: warp.array2d(dtype=float),
lowers_ref: warp.array1d(dtype=float),
indices_ref: warp.array2d(dtype=int),
out_solve_ref: warp.array2d(dtype=float),
):
i_neuron = warp.tid()
diags = warp.tile_load(diags_ref[i_neuron], TILE_SIZE, storage='shared')
solves = warp.tile_load(solves_ref[i_neuron], TILE_SIZE, storage='shared')
lowers = warp.tile_load(lowers_ref, TILE_SIZE, storage='shared')
lowers[0] = 0.0
lower_effect = warp.tile_map(div_kernel, -lowers, diags)
solve_effect = warp.tile_map(div_kernel, solves, diags)
for i in range(indices_ref.shape[0]):
k_step_parent = warp.tile_load(indices_ref[i], TILE_SIZE)
solve_effect = lower_effect * solve_effect[k_step_parent] + solve_effect
lower_effect = lower_effect * lower_effect[k_step_parent]
warp.tile_store(out_solve_ref[i_neuron], solve_effect, TILE_SIZE)
diags = np.random.randn(1024, 128).astype(np.float32)
solves = np.random.randn(1024, 128).astype(np.float32)
lowers = np.random.randn(128).astype(np.float32)
indices = np.random.randint(0, 128, size=(7, 128)).astype(np.int32)
out_solve = np.zeros((1024, 128), dtype=np.float32)
diags_wp = warp.array2d(diags)
solves_wp = warp.array2d(solves)
lowers_wp = warp.array1d(lowers)
indices_wp = warp.array2d(indices)
out_solve_wp = warp.array2d(out_solve)
warp.launch_tiled(
kernel,
dim=diags.shape,
inputs=[diags_wp, solves_wp, lowers_wp, indices_wp, out_solve_wp],
block_dim=diags.shape[1]
)
System Information
No response
Did you see these?
https://nvidia.github.io/warp/modules/functions.html#warp.tile_load_indexed https://nvidia.github.io/warp/modules/functions.html#warp.tile_store_indexed https://nvidia.github.io/warp/modules/functions.html#warp.tile_atomic_add_indexed
Could you please demonstrate how to correctly write this kernel using warp.tile_load_indexed? Thanks.
@chaoming0625 I see what you're trying to do now. wp.tile_load_indexed() is meant for loading a tile from global memory, whereas you would like to be able to write solve_effect[k_step_parent], which swizzles an existing tile using a tile of indices. We don't have this capability yet, but I can look into adding it.
Changing this to a feature request.