structured-nets
structured-nets copied to clipboard
hadamard_transform_cuda seems incorrect
Hi, I've tried your repo and find that using the same input data, hadamard_transform_cuda and hadamard_transform_torch would have different outputs.
Here is an example:
batch_size = 10 n = 64 device = 'cuda:0' u = torch.eye(n, requires_grad = True, device = device) u2 = u.to('cpu') result_cuda = hadamard_transform_cuda(u) result_torch = hadamard_transform_torch(u2)
Then the output of cuda is: tensor([[1., 0., 0., ..., 0., 0., 0.], [0., 1., 0., ..., 0., 0., 0.], [0., 0., 1., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 0., 1., 0.], [0., 0., 0., ..., 0., 0., 1.]])
However, that of torch is: tensor([[ 1., 1., 1., ..., 1., 1., 1.], [ 1., -1., 1., ..., -1., 1., -1.], [ 1., 1., -1., ..., 1., -1., -1.], ..., [ 1., -1., 1., ..., -1., 1., -1.], [ 1., 1., -1., ..., 1., -1., -1.], [ 1., -1., -1., ..., -1., -1., 1.]])
This issue occurs only when using import hadamard_cuda
.
When using
import torch.utils.cpp_extension
hadamard_cuda = torch.utils.cpp_extension.load(
name='hadamard_cuda',
sources=[
'hadamard_cuda.cpp',
'hadamard_cuda_kernel.cu',
],
extra_cuda_cflags=['-O2'],
verbose=False
)
The output is correct.