pytorch_cluster Radius CPU computation with KDTree.query() when number of neighbors is higher than max_num

Hello,

In the radius search CPU, you are using:

tree = scipy.spatial.cKDTree(x.detach().numpy())
_, col = tree.query( y.detach().numpy(), k=max_num_neighbors, distance_upper_bound=r + 1e-8)

It seems that there is no guaranty that returned indices are randomly selected if there are more points within range r than max_num_neighbors.

Here is a sample code:

data = np.random.rand(100000, 3)
tree = KDTree(data)
distances, ids = tree.query([0.5,0.5,0.5], k=32, distance_upper_bound=0.2)
print(distances.max())
---> 0.044963456313535384 (reproducible if launched several times)

It seems that it searches for the knn and return only the neighbors within r. The right function may be query_ball_point?

Apr 07 '22 08:04 aboulch

The GPU version does as expected.

Apr 07 '22 09:04 aboulch

Are you using an older version of torch-cluster? In the current version, we no longer make use of scipy for this, see https://github.com/rusty1s/pytorch_cluster/blob/master/torch_cluster/radius.py.

Apr 07 '22 09:04 rusty1s

Thanks for qucik response. I was not on the right version of the code documentation. I use the 1.5.9 version of the code see bellow

In an IPython session:

In [5]: import torch_cluster

In [6]: torch_cluster.__version__
Out[6]: '1.5.9'

In [7]: from torch_cluster import radius

In [9]: import torch

In [10]: data = torch.rand(1000000, 3)

In [11]: query = torch.tensor([[0.5,0.5,0.5]])

# Compute distance of the further neighbor

# CPU
In [12]: (data[radius(data, query, r=0.2, max_num_neighbors=32)[1]] - 0.5).norm(dim=1).max()
Out[12]: tensor(0.0607)

In [13]: (data[radius(data, query, r=0.2, max_num_neighbors=64)[1]] - 0.5).norm(dim=1).max()
Out[13]: tensor(0.0701)

In [14]: (data[radius(data, query, r=0.2, max_num_neighbors=128)[1]] - 0.5).norm(dim=1).max()
Out[14]: tensor(0.0824)

# GPU
In [15]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=32)[1]] - 0.5).norm(dim=1).max()
Out[15]: tensor(0.1985)

In [16]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=64)[1]] - 0.5).norm(dim=1).max()
Out[16]: tensor(0.2000)

In [17]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=128)[1]] - 0.5).norm(dim=1).max()
Out[17]: tensor(0.2000)

CPU and GPU do not give the same results.

Apr 07 '22 09:04 aboulch

Hello, Problem could be that nanoflann does not produce results in a random order when lookning for the neighbors in a given radius. This is due the tree traversal process. A solution could be to shuffle the index list?

radius_cpu.cpp l94

std::vector<std::pair<size_t, scalar_t>> ret_matches;
size_t num_matches = mat_index.index->radiusSearch(
    y_data + i * y.size(1), r * r, ret_matches, params);

// proposition
std::random_shuffle(ret_matches.begin(), ret_matches.end());

for (size_t j = 0; j < std::min(num_matches, (size_t)max_num_neighbors); j++) {
            out_vec.push_back(x_start + ret_matches[j].first);
            out_vec.push_back(i);
}

Apr 08 '22 08:04 aboulch

Yes, our GPU implementation is non-deterministic while the CPU one is deterministic (it will pick the closest points). Do you think this is a problem? I am happy to add some form of random shuffling to the CPU implementation (at best controllable via an input argument?). Let me know if you have interest in contributing this feature!

Apr 08 '22 12:04 rusty1s

pytorch_cluster
pytorch_cluster copied to clipboard

Radius CPU computation with KDTree.query() when number of neighbors is higher than max_num_neighbors

pytorch_cluster pytorch_cluster copied to clipboard

Radius CPU computation with KDTree.query() when number of neighbors is higher than max_num_neighbors

pytorch_cluster
pytorch_cluster copied to clipboard