taichi ndrange is limited in size

Describe the bug When using nd-indexing (with ti.ndrange), all indices are not computed. For instance, ti.ndrange(M, M) with M = 100000 returns i indices between 1 and 14100 only.

To Reproduce

import torch
import taichi as ti
import taichi.math as tm

ti.init(arch=ti.gpu)
@ti.kernel
def sum_exp(supp_x: ti.types.ndarray(), supp_y: ti.types.ndarray(), out: ti.types.ndarray()):  # A kernel
    for i, j in ti.ndrange(supp_x.shape[0], supp_y.shape[0]):
        out[i] += tm.exp(-ti.pow(supp_y[j] - supp_x[i], 2)/2.)
        
def sum_exp_torch(x, y):
    out = torch.zeros(x.shape[0]).to(x.device)
    sum_exp(x, y, out)
    return out

x = torch.linspace(0., 1., 100000).cuda()
y = torch.linspace(0., 1., 100000).cuda()
out = sum_exp_torch(x, y)

In this example, out[14101:] is zero.

Additional comments The example I provided is on GPU, but I experience the same problem on CPU, and across multiple machines with different hardware. Also, using two nested loops resolves the problem, but it looses performance at least on small matrices, and I guess also some optimisations like TLS or Shared Memory ?

Jun 08 '23 12:06 NightWinkle

Seems that M*M has overflown int32. There may be some difficulties to resolve it since some back-ends do not support int64. We will investigate it later.

Jun 09 '23 04:06 listerily

Seems that M*M has overflown int32. There may be some difficulties to resolve it since some back-ends do not support int64. We will investigate it later.

Hello, I think a prompt should pop up when using an unsupported backend, there are actually many people who use Taichi to do numerical calculations on CUDA or X86CPUs, hoping to relax the restrictions

Sep 13 '23 18:09 yojeep

taichi taichi copied to clipboard

ndrange is limited in size

taichi
taichi copied to clipboard