pytorch-gpu-benchmark
pytorch-gpu-benchmark copied to clipboard
RuntimeError: miopenStatusUnknownError
I am running Ubuntu 20.04.2 with all updates:
$ uname -a
Linux bengt-desktop 5.11.0-37-generic #41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I am running a Vega 64 on a Threadripper 1950X with ROCm 4.3.1:
$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen Threadripper 1950X 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen Threadripper 1950X 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3900
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65711880(0x3eaaf08) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65711880(0x3eaaf08) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65711880(0x3eaaf08) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx900
Uuid: GPU-02151de3936c4944
Marketing Name: Vega 10 XL/XT [Radeon RX Vega 56/64]
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
Chip ID: 26751(0x687f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1630
BDFID: 17664
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx900:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
I set up a virtual environment something like this:
$ python3.8 -m venv venv
$ venv/bin/python -m pip install --upgrade torch torchvision==0.10.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
$ venv/bin/python -m pip install --upgrade pandas psutil
This left me with an environment like so:
$ venv/bin/python -m pip freeze --all
numpy==1.21.2
pandas==1.3.3
Pillow==8.3.2
pip==20.0.2
pkg-resources==0.0.0
psutil==5.8.0
python-dateutil==2.8.2
pytz==2021.3
setuptools==44.0.0
six==1.16.0
torch==1.9.1+rocm4.2
torchvision==0.10.1+rocm4.2
typing-extensions==3.10.0.2
Now, the benchmark gives me these errors:
$ venv/bin/python benchmark_models.py -g 1
benchmark start : 2021/10/12 21:01:33
Number of GPUs on current device : 1
CUDA Version : None
Cudnn Version : 2011000
Device Name : Vega 10 XL/XT [Radeon RX Vega 56/64]
uname_result(system='Linux', node='bengt-desktop', release='5.11.0-37-generic', version='#41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021', machine='x86_64', processor='x86_64')
scpufreq(current=2320.7297500000004, min=2200.0, max=3900.0)
cpu_count: 32
memory_available: 55991275520
Benchmarking Training float precision type mnasnet0_5
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm/miopen/share/miopen/db/gfx900_64.kdb Performance may degrade
MIOpen(HIP): Error [SetIsaName] 'amd_comgr_action_info_set_isa_name(handle, isa.c_str())' amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-: INVALID_ARGUMENT (2)
MIOpen(HIP): Error [BuildOcl] comgr status = INVALID_ARGUMENT (2)
MIOpen(HIP): Warning [BuildOcl] amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-
MIOpen Error: /MIOpen/src/hipoc/hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl
Traceback (most recent call last):
File "benchmark_models.py", line 183, in <module>
train_result = train(precision)
File "benchmark_models.py", line 93, in train
prediction = model(img.to("cuda"))
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torchvision/models/mnasnet.py", line 148, in forward
x = self.layers(x)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/bengt/Downloads/Projekte/github.com/ryujaehun/pytorch-gpu-benchmark/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: miopenStatusUnknownError
Any idea what to do about that?