punica
punica copied to clipboard
sgmv_cutlass calculate wrong output
I'm running the following code and find the answer goes wrong. I initialize the x
and w
to be all ones. So the output y
value should be h1=4096
.
But my output is not. Half of the output is 4096 and the other half is 2528. Weird! My observation is that the wrong answer happens when h2>=32 for shrink.
The following code is adapted from benchmarks/bench_sgmv_cutlass.py
import torch
import punica.ops
bs = 4
h1 = 4096
h2 = 32
num_layers = 1
dtype = torch.float16
device = torch.device("cuda:0")
problem_sizes = [2, 2]
w = [
torch.ones((num_layers, h1, h2), dtype=dtype, device=device)
for _ in range(len(problem_sizes))
]
w_ptr = torch.tensor([t.data_ptr() for t in w],
dtype=torch.int64,
device=device)
s = torch.cumsum(
torch.tensor([0] + problem_sizes, device=device),
dim=0,
dtype=torch.int32)
x = torch.ones((s[-1], h1), dtype=dtype, device=device)
y = torch.zeros((s[-1], h2), dtype=dtype, device=device)
punica.ops.sgmv_cutlass(y, x, w_ptr, s, layer_idx=0)
print(y)