warp [QUESTION] How to load a data tile according to the given index tile?

Bug Description

Could you help me compile this kernel successfully? Thanks!


import numpy as np
import warp

TILE_SIZE = warp.constant(128)


@warp.func
def div_kernel(x: float, y: float) -> float:
    return x / y


@warp.kernel
def kernel(
    diags_ref: warp.array2d(dtype=float),
    solves_ref: warp.array2d(dtype=float),
    lowers_ref: warp.array1d(dtype=float),
    indices_ref: warp.array2d(dtype=int),
    out_solve_ref: warp.array2d(dtype=float),
):
    i_neuron = warp.tid()

    diags = warp.tile_load(diags_ref[i_neuron], TILE_SIZE, storage='shared')
    solves = warp.tile_load(solves_ref[i_neuron], TILE_SIZE, storage='shared')
    lowers = warp.tile_load(lowers_ref, TILE_SIZE, storage='shared')

    lowers[0] = 0.0
    lower_effect = warp.tile_map(div_kernel, -lowers, diags)
    solve_effect = warp.tile_map(div_kernel, solves, diags)

    for i in range(indices_ref.shape[0]):
        k_step_parent = warp.tile_load(indices_ref[i], TILE_SIZE)
        solve_effect = lower_effect * solve_effect[k_step_parent] + solve_effect
        lower_effect = lower_effect * lower_effect[k_step_parent]

    warp.tile_store(out_solve_ref[i_neuron], solve_effect, TILE_SIZE)


diags = np.random.randn(1024, 128).astype(np.float32)
solves = np.random.randn(1024, 128).astype(np.float32)
lowers = np.random.randn(128).astype(np.float32)
indices = np.random.randint(0, 128, size=(7, 128)).astype(np.int32)
out_solve = np.zeros((1024, 128), dtype=np.float32)

diags_wp = warp.array2d(diags)
solves_wp = warp.array2d(solves)
lowers_wp = warp.array1d(lowers)
indices_wp = warp.array2d(indices)
out_solve_wp = warp.array2d(out_solve)

warp.launch_tiled(
    kernel,
    dim=diags.shape,
    inputs=[diags_wp, solves_wp, lowers_wp, indices_wp, out_solve_wp],
    block_dim=diags.shape[1]
)

System Information

No response

Sep 17 '25 13:09 chaoming0625

Did you see these?

https://nvidia.github.io/warp/modules/functions.html#warp.tile_load_indexed https://nvidia.github.io/warp/modules/functions.html#warp.tile_store_indexed https://nvidia.github.io/warp/modules/functions.html#warp.tile_atomic_add_indexed

Sep 17 '25 19:09 daedalus5

Could you please demonstrate how to correctly write this kernel using warp.tile_load_indexed? Thanks.

Sep 18 '25 16:09 chaoming0625

@chaoming0625 I see what you're trying to do now. wp.tile_load_indexed() is meant for loading a tile from global memory, whereas you would like to be able to write solve_effect[k_step_parent], which swizzles an existing tile using a tile of indices. We don't have this capability yet, but I can look into adding it.

Sep 26 '25 21:09 daedalus5

Changing this to a feature request.

Oct 06 '25 13:10 daedalus5