Strange error in test: Diff too big
On OpenCL CPU. After update OpenCL runtime I see another error like error in other test script:
Mean 1d
Accessing device #0:AMD EPYC 7542 32-Core Processor on Intel(R) CPU Runtime for OpenCL(TM) Applications
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
y 0.000000
x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
y 0.000000
x0 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
y 0.000000
x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
y 0.000000
x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
y 0.000000
x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
y 0.000000
x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
y 0.000000
x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
y 0.000000
x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
y 0.000000
x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
y 0.000000
x0 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
tensor(0.0413, grad_fn=<NllLossBackward0>)
tensor(0.0418)
y 0.000469
x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
y 0.000000
x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
BCE Loss
torch.Size([])
torch.Size([])
x0 0.000001
x1 0.000000
y 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
x0 0.000001
y 0.000000
x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
y 0.000000
x0 0.000000
x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
y 0.000000
x0 0.000000
x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
p_bias 0.000000
y 0.000000
x0 0.000000
p_weight 0.000000
Linear 3d
p_bias 0.000000
y 0.000000
x0 0.000000
p_weight 0.000000
Conv
Traceback (most recent call last):
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
test_all(r.device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 254, in test_all
test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 74, in test_fwd_bwd_op
y_cpu.backward(dy_cpu,retain_graph=True)
File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator
On AMD OpenCL (AMDAPPSDK-3.0) another error:
python tests/test_op.py --device privateuseone:2
Mean 1d
Accessing device #2:AMD EPYC 7542 32-Core Processor on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
tensor([[[-0.2863, -0.1444, 1.4827, -0.2142],
[ 0.9526, -1.2787, 0.7404, -0.3989],
[ 0.8163, 0.2142, 0.2852, 0.8597]]], grad_fn=<MeanBackward1>)
tensor([[[1.4019, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000]]])
y 1.688240
x0 0.000000
Traceback (most recent call last):
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
test_all(r.device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 158, in test_all
test_fwd_bwd([([2,3,4],-1)],lambda x:torch.mean(x,dim=0,keepdim=True),device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 153, in test_fwd_bwd
raise Exception("Diff too big")
Exception: Diff too big
max_diff = 1.9810690879821777
On AMD OpenCL (from amdgpu-pro) also error in the end of test:
Mean 1d
Accessing device #3:Fiji on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
y 0.000000
x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
x0 0.000000
y 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
y 0.000000
x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
y 0.000000
x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
y 0.000000
x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
y 0.000000
x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
y 0.000000
x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
y 0.000000
x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
y 0.000000
x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
x0 0.000000
y 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
y 0.000000
x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
y 0.000000
x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
y 0.000000
x0 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
x0 0.000000
y 0.000000
BCE Loss
torch.Size([])
torch.Size([])
x0 0.000058
y 0.000000
x1 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
x0 0.000008
y 0.000000
x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
y 0.000000
x0 0.000000
x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
y 0.000000
x0 0.000000
x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
p_weight 0.000000
p_bias 0.000000
y 0.000000
x0 0.000000
Linear 3d
p_weight 0.000002
p_bias 0.000000
y 0.000000
x0 0.000000
Conv
Traceback (most recent call last):
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 282, in <module>
test_all(r.device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 254, in test_all
test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 74, in test_fwd_bwd_op
y_cpu.backward(dy_cpu,retain_graph=True)
File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator
Sorry for late reply... For some reason missed it.
What pytorch version and what is the GPU are you using?
I'm using PyTorch version 1.13.1 and Amd Fury
Mean 1d Accessing device #1:AMD Radeon R9 Fury Series (radeonsi, fiji, LLVM 17.0.6, DRM 3.57, 6.8.9-calculate) on rusticl ....... Sum 2d squeeze torch.Size([3]) torch.Size([3]) y 0.000000 x0 0.000000 LogSoftmax LLVM ERROR: Cannot select: 0x7feb044c5610: f32 = and 0x7feb044c54c0, Constant:i32<2147483647> 0x7feb044c54c0: f32 = bitcast 0x7feb040d5410 0x7feb040d5410: i32,ch = CopyFromReg 0x5562cb846890, Register:i32 %14 0x7feb044a2570: i32 = Register %14 0x7feb044c3120: i32 = Constant<2147483647> In function: main Аварийный останов
1st CPU is not supported
on rusticl
From my experience rusticl is horrible buggy. It crashes from my on rx560. Try AMD rocm opencl driver or Mesa driver