pytorch_cluster
pytorch_cluster copied to clipboard
Unknown builtin op: torch_cluster::fps
I'd like to load a TorchScript model trained with Python and which uses 4 libs: pytorch-geometric, pytorch-scatter, pytorch-sparse and pytorch-cluster. The export part is ok. Here is my minimal python code to reproduce the error.
Python
import torch
from torch_geometric.nn import MLP, PointConv, fps, global_max_pool, radius, knn_interpolate
class SAModule(torch.nn.Module):
def __init__(self, ratio, r, nn):
super().__init__()
self.ratio = ratio
self.r = r
self.conv = PointConv(nn, add_self_loops=False).jittable("(Tuple[OptTensor, OptTensor], Tuple[Tensor, Tensor], Tensor) -> Tensor")
def forward(self, x: torch.Tensor, pos: torch.Tensor, batch: torch.Tensor):
idx = fps(pos, batch, ratio=self.ratio)
test_out = radius(pos, pos[idx], self.r, batch, batch[idx],max_num_neighbors=64)
edge_index = test_out[[1, 0]]
x_dst = None if x is None else x[idx]
x = self.conv((x, x_dst), (pos, pos[idx]), edge_index)
pos, batch = pos[idx], batch[idx]
return x, pos, batch
class GlobalSAModule(torch.nn.Module):
def __init__(self, nn):
super().__init__()
self.nn = nn
def forward(self, x, pos, batch):
x = self.nn(torch.cat([x, pos], dim=1))
x = global_max_pool(x, batch)
pos = pos.new_zeros((x.size(0), 3))
batch = torch.arange(x.size(0), device=batch.device)
return x, pos, batch
class FPModule(torch.nn.Module):
def __init__(self, k, nn):
super().__init__()
self.k = k
self.nn = nn
def forward(self, x, pos, batch, x_skip, pos_skip, batch_skip):
x = knn_interpolate(x, pos, pos_skip, batch, batch_skip, k=self.k)
if x_skip is not None:
x = torch.cat([x, x_skip], dim=1)
x = self.nn(x)
return x, pos_skip, batch_skip
class Net(torch.nn.Module):
def __init__(self, num_classes):
super().__init__()
# Input channels account for both `pos` and node features.
self.sa1_module = SAModule(0.2, 0.2, MLP([3 + 3, 64, 64, 128]))
self.sa2_module = SAModule(0.25, 0.4, MLP([128 + 3, 128, 128, 256]))
self.sa3_module = GlobalSAModule(MLP([256 + 3, 256, 512, 1024]))
self.fp3_module = FPModule(1, MLP([1024 + 256, 256, 256]))
self.fp2_module = FPModule(3, MLP([256 + 128, 256, 128]))
self.fp1_module = FPModule(3, MLP([128 + 3, 128, 128, 128]))
self.mlp = MLP([128, 128, 128, num_classes], dropout=0.5,
batch_norm=False)
self.lin1 = torch.nn.Linear(128, 128)
self.lin2 = torch.nn.Linear(128, 128)
self.lin3 = torch.nn.Linear(128, num_classes)
def forward(self, x, pos, batch):
sa0_out = (x, pos, batch)
sa1_out = self.sa1_module(*sa0_out)
sa2_out = self.sa2_module(*sa1_out)
sa3_out = self.sa3_module(*sa2_out)
fp3_out = self.fp3_module(*sa3_out, *sa2_out)
fp2_out = self.fp2_module(*fp3_out, *sa1_out)
x, _, _ = self.fp1_module(*fp2_out, *sa0_out)
return self.mlp(x).log_softmax(dim=-1)
device = "cpu"
net = torch.jit.script(Net(num_classes=3)).to(device)
net.save('repro_bug.pt')
Libs version
pytorch 1.11.0
torch-geometric 2.0.5
torch-cluster 1.6.0
Now I want to load the repro_bug.pt model on C++. The problem is that some operators are unknown. As indicated in https://github.com/pyg-team/pytorch_geometric/tree/master/examples/cpp, I've managed to install C++ APIs of pytorch-scatter, pytorch-sparse and pytorch-cluster and I want to link these libs to my C++ project. I succeed for pytorch-scatter, pytorch-sparse and I naively though that I should proceed similarly for pytorch-cluster. But this doesn't work and I don't know how to proceed.
Here are the CMakeLists.txt and my application code.
CMakeLists.txt
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)
find_package(TorchSparse REQUIRED)
find_package(TorchScatter REQUIRED)
find_package(TorchCluster REQUIRED)
add_executable(example-app example-app.cpp)
target_compile_features(example-app PUBLIC cxx_range_for)
target_link_libraries(example-app TorchSparse::TorchSparse)
target_link_libraries(example-app TorchScatter::TorchScatter)
target_link_libraries(example-app TorchCluster::TorchCluster)
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)
example-app.cpp
#include <torch/script.h> // One-stop header.
#include <torchsparse/sparse.h>
#include <torchscatter/scatter.h>
#include <torchcluster/cluster.h>
#include <iostream>
#include <memory>
int main(int argc, const char* argv[]) {
if (argc != 2) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
torch::jit::script::Module module;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
module = torch::jit::load(argv[1]);
}
catch (const c10::Error& e) {
std::cerr << e.what();
std::cerr << "error loading the model\n";
return -1;
}
std::cout << "ok\n";
}
Here is how I compiled my app using CMake:
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="<path_to_libtorch_linux>" ..
cmake --build .
When I execute the application I got the following error:
terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():
Unknown builtin op: torch_cluster::fps.
Could not find any similar ops to torch_cluster::fps. This op may not exist or may not be currently supported in TorchScript.
:
File "xxxxxxxx\anaconda3\envs\torch_latest_cuda113\lib\site-packages\torch_cluster\fps.py", line 70
ptr = torch.tensor([0, src.size(0)], device=src.device)
return torch.ops.torch_cluster.fps(src, ptr, r, random_start)
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized File "code/__torch__/torch_cluster/fps.py", line 29
ptr1 = torch.tensor([0, torch.size(src, 0)], dtype=None, device=ops.prim.device(src))
ptr = ptr1
_3 = ops.torch_cluster.fps(src, ptr, r, random_start)
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return _3
'fps' is being compiled since it was called from 'fps'
File "xxxxxxxxxx\anaconda3\envs\torch_latest_cuda113\lib\site-packages\torch_geometric\nn\pool\__init__.py", line 49
index = fps(x, batch, ratio=0.5)
"""
return torch_cluster.fps(x, batch, ratio, random_start)
~~~~~~~~~~~~~~~~~ <--- HERE
Serialized File "code/__torch__/torch_geometric/nn/pool.py", line 5
ratio: float=0.5,
random_start: bool=True) -> Tensor:
_0 = __torch__.torch_cluster.fps.fps(x, batch, ratio, random_start, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return _0
def radius(x: Tensor,
'fps' is being compiled since it was called from 'SAModule.forward'
Serialized File "code/__torch__.py", line 55
pos: Tensor,
batch: Tensor) -> Tuple[Tensor, Tensor, Tensor]:
_25 = __torch__.torch_geometric.nn.pool.fps
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_26 = __torch__.torch_geometric.nn.pool.radius
ratio = self.ratio
Aborted (core dumped)
Do you have any ideas about how I can resolve my problem please?
EDIT
Notice that I had to bring some minor changes in knn_interpolate.py (in torch_geometric/nn/unpool). Here is the modified file :
import torch
from typing import Optional
from torch_scatter import scatter_add
from torch_geometric.nn import knn
def knn_interpolate(x: torch.Tensor, pos_x: torch.Tensor, pos_y: torch.Tensor, batch_x: Optional[torch.Tensor] = None,
batch_y: Optional[torch.Tensor] = None, k: int = 3, num_workers: int = 1):
r"""The k-NN interpolation from the `"PointNet++: Deep Hierarchical
Feature Learning on Point Sets in a Metric Space"
<https://arxiv.org/abs/1706.02413>`_ paper.
For each point :math:`y` with position :math:`\mathbf{p}(y)`, its
interpolated features :math:`\mathbf{f}(y)` are given by
.. math::
\mathbf{f}(y) = \frac{\sum_{i=1}^k w(x_i) \mathbf{f}(x_i)}{\sum_{i=1}^k
w(x_i)} \textrm{, where } w(x_i) = \frac{1}{d(\mathbf{p}(y),
\mathbf{p}(x_i))^2}
and :math:`\{ x_1, \ldots, x_k \}` denoting the :math:`k` nearest points
to :math:`y`.
Args:
x (Tensor): Node feature matrix
:math:`\mathbf{X} \in \mathbb{R}^{N \times F}`.
pos_x (Tensor): Node position matrix
:math:`\in \mathbb{R}^{N \times d}`.
pos_y (Tensor): Upsampled node position matrix
:math:`\in \mathbb{R}^{M \times d}`.
batch_x (LongTensor, optional): Batch vector
:math:`\mathbf{b_x} \in {\{ 0, \ldots, B-1\}}^N`, which assigns
each node from :math:`\mathbf{X}` to a specific example.
(default: :obj:`None`)
batch_y (LongTensor, optional): Batch vector
:math:`\mathbf{b_y} \in {\{ 0, \ldots, B-1\}}^N`, which assigns
each node from :math:`\mathbf{Y}` to a specific example.
(default: :obj:`None`)
k (int, optional): Number of neighbors. (default: :obj:`3`)
num_workers (int): Number of workers to use for computation. Has no
effect in case :obj:`batch_x` or :obj:`batch_y` is not
:obj:`None`, or the input lies on the GPU. (default: :obj:`1`)
"""
with torch.no_grad():
assign_index = knn(pos_x, pos_y, k, batch_x=batch_x, batch_y=batch_y,
num_workers=num_workers)
# y_idx, x_idx = assign_index # commented for conversion to torchscript
y_idx, x_idx = assign_index[0], assign_index[1]
diff = pos_x[x_idx] - pos_y[y_idx]
squared_distance = (diff * diff).sum(dim=-1, keepdim=True)
weights = 1.0 / torch.clamp(squared_distance, min=1e-16)
y = scatter_add(x[x_idx] * weights, y_idx, dim=0, dim_size=pos_y.size(0))
y = y / scatter_add(weights, y_idx, dim=0, dim_size=pos_y.size(0))
return y
Thanks a lot for reporting. While I hadn't have time to reproduce this myself, my first guess is that this is related to Windows. See here for more information. We brought support for this to torch-scatter and torch-sparse, but not to torch-cluster yet.
I'm not sure this is related to Windows since I switched to Ubuntu and got the very same error
Thanks @rusty1s, I applied the same PR as in torch-sparse and this resolves my issue.
Amazing. Can you send a PR for torch-cluster as well? This would be great!
PR has just been sent. Happy to contribute to such a great project!
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?