torchsparse
torchsparse copied to clipboard
conv3d with empty kernel_map
Applying convolution transpose with conv3d sometimes returns an error due to the empty of kernel_map. The reason is that sparseconv_op is called without checking kernel_map is empty or not.
Thanks for pointing this out! I have created a PR for this issue.
@cslxiao, could you please install the latest version to see whether the issue has been resolved.
@zhijian-liu, good jod! But I think the bug is still there. in the code:
# do upsample
original_stride = int(cur_stride / stride)
kernel_map = inputs.kernel_maps.get(
'k%s_os%d_s%d_d%d' % (ks, original_stride, stride, dilation), None)
output_features = sparseconv_op(features, kernel, kernel_map[0],
kernel_map[1], kernel_map[2],
transpose)
the error occurs at calling of sparseconv_op with kernel_map = None, because the kernel_maps of the input tensor doesn't necessarily contain the kernel_map required at the step when the network is not center-symmetric with regards to kernel_size, stride and dilation setting.
Hi @cslxiao, thanks for the information. Could you please provide me with a minimal example to reproduce this? Thanks!
Hi, @zhijian-liu take the modified minkunet in repo e3d as an example,
class DilatedMinkUNet(MinkUNet):
def __init__(self, **kwargs):
super().__init__(**kwargs)
cr = kwargs.get('cr', 1.0)
dilation = kwargs.get('dilation', [1, 1, 1, 2, 2, 1, 1, 1])
cs = [32, 32, 64, 128, 256, 256, 128, 96, 96]
cs = [int(cr * x) for x in cs]
self.cs = cs
self.run_up = kwargs.get('run_up', True)
self.stem = nn.Sequential(
spnn.Conv3d(4, cs[0], kernel_size=3, stride=1),
spnn.BatchNorm(cs[0]), spnn.ReLU(True),
spnn.Conv3d(cs[0], cs[0], kernel_size=3, stride=1),
spnn.BatchNorm(cs[0]), spnn.ReLU(True))
self.stage1 = nn.Sequential(
BasicConvolutionBlock(cs[0], cs[0], ks=2, stride=2, dilation=dilation[0]),
ResidualBlock(cs[0], cs[1], ks=3, stride=1, dilation=dilation[0]),
ResidualBlock(cs[1], cs[1], ks=3, stride=1, dilation=dilation[0]),
)
self.stage2 = nn.Sequential(
BasicConvolutionBlock(cs[1], cs[1], ks=2, stride=2, dilation=dilation[1]),
ResidualBlock(cs[1], cs[2], ks=3, stride=1, dilation=dilation[1]),
ResidualBlock(cs[2], cs[2], ks=3, stride=1, dilation=dilation[1]),
)
self.stage3 = nn.Sequential(
BasicConvolutionBlock(cs[2], cs[2], ks=2, stride=2, dilation=dilation[2]),
ResidualBlock(cs[2], cs[3], ks=3, stride=1, dilation=dilation[2]),
ResidualBlock(cs[3], cs[3], ks=3, stride=1, dilation=dilation[2]),
)
self.stage4 = nn.Sequential(
BasicConvolutionBlock(cs[3], cs[3], ks=2, stride=2, dilation=dilation[3]),
ResidualBlock(cs[3], cs[4], ks=3, stride=1, dilation=dilation[3]),
ResidualBlock(cs[4], cs[4], ks=3, stride=1, dilation=dilation[3]),
)
self.up1 = nn.ModuleList([
BasicDeconvolutionBlock(cs[4], cs[5], ks=2, stride=2, dilation=dilation[4]),
nn.Sequential(
ResidualBlock(cs[5] + cs[3], cs[5], ks=3, stride=1, dilation=dilation[4]),
ResidualBlock(cs[5], cs[5], ks=3, stride=1, dilation=dilation[4]),
)
])
self.up2 = nn.ModuleList([
BasicDeconvolutionBlock(cs[5], cs[6], ks=2, stride=2, dilation=dilation[5]),
nn.Sequential(
ResidualBlock(cs[6] + cs[2], cs[6], ks=3, stride=1, dilation=dilation[5]),
ResidualBlock(cs[6], cs[6], ks=3, stride=1, dilation=dilation[5]),
)
])
self.up3 = nn.ModuleList([
BasicDeconvolutionBlock(cs[6], cs[7], ks=2, stride=2, dilation=dilation[6]),
nn.Sequential(
ResidualBlock(cs[7] + cs[1], cs[7], ks=3, stride=1, dilation=dilation[6]),
ResidualBlock(cs[7], cs[7], ks=3, stride=1, dilation=dilation[6]),
)
])
self.up4 = nn.ModuleList([
BasicDeconvolutionBlock(cs[7], cs[8], ks=2, stride=2, dilation=dilation[7]),
nn.Sequential(
ResidualBlock(cs[8] + cs[0], cs[8], ks=3, stride=1, dilation=dilation[7]),
ResidualBlock(cs[8], cs[8], ks=3, stride=1, dilation=dilation[7]),
)
])
symmetric dilation parameters like dilation: [1, 1, 1, 2, 2, 1, 1, 1] or dilation: [1, 2, 2, 2, 2, 2, 2, 1] is ok. But asymmetric dilation parameters will cause the aforementioned error, such as dilation: [1, 1, 1, 1, 1, 1, 1, 2].
Same question here. Do you have the plan to support directly inverse conv3d?
I think what you need here is a generative sparse deconvolution. This is a bit different from what we have implemented now. We will investigate this in more detail.
I am still having this issue with v1.4.0
import torch
import torchsparse
import torchsparse.nn
feat_depth = 64
coords = torch.zeros((0, 4), dtype=torch.int32, device='cuda')
feats = torch.zeros((0, feat_depth), dtype=torch.float32, device='cuda')
t = torchsparse.SparseTensor(feats, coords)
conv = torchsparse.nn.Conv3d(feat_depth, feat_depth, kernel_size=3, bias=False)
conv(t)
RuntimeError: CUDA error: invalid configuration argument
I think I got the same issue here. Is this related to generalized Sparse Tensor?
/tmp/ipykernel_27426/3796938854.py in forward(self, x)
198 x = to_sparse(x)
199
--> 200 x0 = self.stem(x)
201 x1 = self.stage1(x0)
202 x2 = self.stage2(x1)
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
137 def forward(self, input):
138 for module in self:
--> 139 input = module(input)
140 return input
141
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torchsparse-1.4.0-py3.8-linux-x86_64.egg/torchsparse/nn/modules/conv.py in forward(self, input)
64
65 def forward(self, input: SparseTensor) -> SparseTensor:
---> 66 return F.conv3d(input,
67 self.kernel,
68 kernel_size=self.kernel_size,
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torchsparse-1.4.0-py3.8-linux-x86_64.egg/torchsparse/nn/functional/conv.py in conv3d(input, weight, kernel_size, bias, stride, dilation, transposed)
121 input.stride)
122 queries = F.sphash(coords, offsets)
--> 123 results = F.sphashquery(queries, references)
124
125 nbsizes = torch.sum(results != -1, dim=1)
~/.conda/envs/torchsparse/lib/python3.8/site-packages/torchsparse-1.4.0-py3.8-linux-x86_64.egg/torchsparse/nn/functional/query.py in sphashquery(queries, references)
19
20 if queries.device.type == 'cuda':
---> 21 output = torchsparse.backend.hash_query_cuda(queries, references,
22 indices)
23 elif queries.device.type == 'cpu':
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
@resuly, could you please check the shape of x (both coords and feats) before the line of x0 = self.stem(x)?
@resuly, could you please check the shape of
x(bothcoordsandfeats) before the line ofx0 = self.stem(x)?
Sorry for the late reply. It should be the same zero inputs issue as @noahstier mentioned above.
@resuly, in this case, you should check what the input size is (before sending into the model). You need to make sure that the input has more than one point.
Since TorchSparse has been upgraded to v2.1.0, could you please attempt to install the latest version? I will now close this issue, but please don't hesitate to reopen it if the problem persists.