warp icon indicating copy to clipboard operation
warp copied to clipboard

Multiple Hashgrids

Open steinraf opened this issue 1 year ago • 1 comments

I am trying to use different hashgrids for spatially separated points but encountering strange values from the hash_grid_point_id function.

I have created an MRE below that shows the issue.

import warp as wp
import numpy as np

# wp.config.mode = "debug"

wp.init()

n_envs = 2
n_points = 2
offset = np.array([10.0, 0.0, 0.0])

r = 0.1

pos = np.random.rand(n_envs, n_points, 3)


grids = []
for i in range(n_envs):
    grid = wp.HashGrid(20, 20, 20)
    pos[i, ...] += i * offset
    grid.build(wp.array(pos[i, ...], dtype=wp.vec3), radius=r)
    grids.append(grid.id)

p = wp.array2d(pos, dtype=wp.vec3)
g = wp.array(grids, dtype=wp.uint64)

print("grids", g)


@wp.kernel
def hashgrid_tester(
    grids: wp.array(dtype=wp.uint64),
    pos: wp.array2d(dtype=wp.vec3),
):
    for env in range(n_envs):
        for tid in range(n_points):

            grid = grids[env]

            i = wp.hash_grid_point_id(grid, tid)
            if i == -1:
                return

            wp.printf(
                "env %d, tid %d, i %d, grid %lld \n",
                env,
                tid,
                i,
                grid,
            )


wp.launch(hashgrid_tester, dim=[1], inputs=[g, p])
wp.synchronize()

print output:

grids [12918456320 12918521344]
env 0, tid 0, i 117, grid 12918456320 
env 0, tid 1, i 479, grid 12918456320 
env 1, tid 0, i 1, grid 12918521344 
env 1, tid 1, i 0, grid 12918521344 

The indices for the second grid get reordered correctly, but the values for the first grid seem to be random.

When using more than 2 points, a Warp CUDA error 716: misaligned address gets thrown, suggesting that the randomness might be from some sort of out of bounds unaligned read.

For the meantime I am using a single HashGrid for the whole domain, but am wondering if multiple instances of the hashgrid are supported and if this is the correct way.

Thanks

steinraf avatar May 03 '24 06:05 steinraf

Thanks a lot for the repro @steinraf, I've not seen this before - we will investigate and get back to you.

mmacklin avatar Jun 11 '24 00:06 mmacklin

@steinraf Sorry for the extremely late reply. I examined your reproducer and believe the issue comes down to the wp.HashGrid from the first iteration being garbage-collected before the hashgrid_tester kernel runs. This led to issues since the wp.HashGrid makes some additional memory allocations which are freed when the object is destroyed:

https://github.com/NVIDIA/warp/blob/17f166e36e60c731f3978ecc3876ccf02c0d7ccb/warp/native/hashgrid.cpp#L206-L222

In the original code, only the grid.id was stored, which wasn't enough to keep the wp.HashGrid from being garbage collected. The workaround is to make sure we keep the hash grid objects from previous iterations alive:

grids = []
for i in range(n_envs):
    grid = wp.HashGrid(20, 20, 20)
    pos[i, ...] += i * offset
    grid.build(wp.array(pos[i, ...], dtype=wp.vec3), radius=r)
    grids.append(grid)  # Changed

p = wp.array2d(pos, dtype=wp.vec3)
g = wp.array([grid.id for grid in grids], dtype=wp.uint64)  # Changed

Then we get:

env 0, tid 0, i 0, grid 17213423616 
env 0, tid 1, i 1, grid 17213423616 
env 1, tid 0, i 1, grid 17213488640 
env 1, tid 1, i 0, grid 17213488640

Debugging strategy:

  1. Ran compute-sanitizer --tool memcheck python test_0.py, no errors
  2. Ran compute-sanitizer --tool initcheck python test_0.py, got Uninitialized __global__ memory read of size 8 bytes at hashgrid_tester_5e2ff1ed_cuda_kernel_forward+0x1b0
  3. Added to top of script wp.config.mode = "debug", wp.config.lineinfo = True, wp.config.line_directives = False
  4. Ran 'compute-sanitizer' again and now see Uninitialized __global__ memory read of size 8 bytes at wp::hash_grid_point_id(unsigned long long, int &)+0x15d0 in hashgrid.h:230

I think I'll clarify our docs to warn about this.

shi-eric avatar Aug 08 '25 16:08 shi-eric