ROCR-Runtime icon indicating copy to clipboard operation
ROCR-Runtime copied to clipboard

rocminfo: static void amd::MemoryRegion::FreeKfdMemory(void*, size_t): Assertion `status == HSAKMT_STATUS_SUCCESS' failed.

Open devurandom opened this issue 5 years ago • 0 comments
trafficstars

System information

❯ inxi -GSC -xx
System:    Host: ernie Kernel: 5.7.7 x86_64 bits: 64 compiler: gcc v: 10.1.0 Desktop: N/A wm: kwin_x11 dm: SDDM 
           Distro: Gentoo Base System release 2.7 
CPU:       Topology: Quad Core model: AMD Ryzen 5 2400G with Radeon Vega Graphics bits: 64 type: MT MCP arch: Zen 
           L2 cache: 2048 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 57518 
           Speed: 1352 MHz min/max: 1600/3600 MHz Core speeds (MHz): 1: 1352 2: 1352 3: 2973 4: 1351 5: 1352 6: 1352 7: 2974 
           8: 1352 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] vendor: ASUSTeK 
           driver: amdgpu v: kernel bus ID: 01:00.0 chip ID: 1002:67ff 
           Device-2: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] vendor: ASUSTeK driver: amdgpu v: kernel 
           bus ID: 0a:00.0 chip ID: 1002:15dd 
           Display: server: X.Org 1.20.8 driver: amdgpu compositor: kwin_x11 resolution: 2560x1080~60Hz 
           OpenGL: renderer: AMD RAVEN (DRM 3.37.0 5.7.7 LLVM 10.0.0) v: 4.6 Mesa 20.1.3 direct render: Yes 

rocminfo is at version 3.5.0.

Problem

When running gdb rocminfo and typing run, I see:

ROCk module is loaded                                             
Able to open /dev/kfd read-write                                
[New Thread 0x7ffff779f700 (LWP 106384)]                        
LoadLib(libhsa-ext-image64.so.1) failed: libhsa-ext-image64.so.1: cannot open shared object file: No such file or directory
=====================                                               
HSA System Attributes                                             
=====================                                                                                                                                                                 
Runtime Version:         1.1                                          
System Timestamp Freq.:  1000.000000MHz                                                                                                                                               
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                                                                                                                                                        
System Endianness:       LITTLE                                     
                                                                                                                                                                                      
==========                                                          
HSA Agents                                                                                                                                                                                                                                                                                                                                                                  
==========                                                        
*******                                                                                                                                                                                                                                                                                                                                                                     Agent 1                                                             
*******                                                             
  Name:                    AMD Ryzen 5 2400G with Radeon Vega Graphics                                                                                                                                                                                                                                                                                                      
  Uuid:                    CPU-XX                                 
  Marketing Name:          AMD Ryzen 5 2400G with Radeon Vega Graphics                                                                                                                
  Vendor Name:             CPU                                    
  Feature:                 None specified                                                                                                                                             
  Profile:                 FULL_PROFILE                           
  Float Round Mode:        NEAR                                                                                                                                                       
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                                                                                                                                                     
  Queue Max Size:          0(0x0)                                 
  Queue Type:              MULTI                                                                                                                                                      
  Node:                    0                                      
  Device Type:             CPU                                                                                                                                                                                                                                                                                                                                                Cache Info:                                                        
    L1:                      32(0x20) KB                          
  Chip ID:                 5597(0x15dd)                                                   
  Cacheline Size:          64(0x40)                             
  Max Clock Freq. (MHz):   3600                                                            
  BDFID:                   2560                                     
  Internal Node ID:        0                                                           
  Compute Unit:            8                                      
  SIMDs per CU:            4                                                    
  Shader Engines:          1                                        
  Shader Arrs. per Eng.:   1                                                                                                                                                          
  WatchPts on Addr. Ranges:4                                        
  Features:                None                                                                                                                                                                                                                                                                                                                                             
  Pool Info:                                                  
    Pool 1                                                                                                                                                                                                                                                                                                                                                                  
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16776832(0xfffe80) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx902                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Ryzen 5 2400G with Radeon Vega Graphics
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 5597(0x15dd)                       
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1250                               
  BDFID:                   2560                               
  Internal Node ID:        0                                  
  Compute Unit:            11                                 
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        160(0xa0)                          
  Max Work-item Per CU:    10240(0x2800)                      
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx902+xnack    
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Baffin [Radeon RX 550 640SP / RX 560/560X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26623(0x67ff)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1210                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            4                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    2097152(0x200000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             
rocminfo: /tmp/portage/dev-libs/rocr-runtime-3.5.0/work/ROCR-Runtime-rocm-3.5.0/src/core/runtime/amd_memory_region.cpp:72: static void amd::MemoryRegion::FreeKfdMemory(void*, size_t): Assertion `status == HSAKMT_STATUS_SUCCESS' failed.

Note the failed assertion.

The backtrace at that point is:

#0  0x00007ffff79abf91 in raise () from /usr/lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff7995537 in abort () from /usr/lib64/libc.so.6
No symbol table info available.
#2  0x00007ffff799540f in ?? () from /usr/lib64/libc.so.6
No symbol table info available.
#3  0x00007ffff79a43e2 in __assert_fail () from /usr/lib64/libc.so.6
No symbol table info available.
#4  0x00007ffff7e257d9 in amd::MemoryRegion::FreeKfdMemory(void*, unsigned long) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#5  0x00007ffff7e2660d in amd::MemoryRegion::Free(void*, unsigned long) const () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#6  0x00007ffff7e68f19 in core::Runtime::FreeMemory(void*) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#7  0x00007ffff7e68568 in core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}::operator()(void*) const () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#8  0x00007ffff7e70546 in void std::__invoke_impl<void, core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}&, void*>(std::__invoke_other, core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}&, void*&&) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#9  0x00007ffff7e7025c in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}&, void*> >::value, std::is_void>::type std::__invoke_r<void, core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}&, void*>(std::__is_invocable&&, (core::Runtime::RegisterAgent(core::Agent*)::{lambda(v
oid*)#2}&)...) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#10 0x00007ffff7e6fd65 in std::_Function_handler<void (void*), core::Runtime::RegisterAgent(core::Agent*)::{lambda(void*)#2}>::_M_invoke(std::_Any_data const&, void*&&) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#11 0x00007ffff7df9087 in std::function<void (void*)>::operator()(void*) const () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#12 0x00007ffff7e0e454 in amd::GpuAgent::ReleaseShader(void*, unsigned long) const () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#13 0x00007ffff7e0d7cb in amd::GpuAgent::~GpuAgent() () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#14 0x00007ffff7e0d960 in amd::GpuAgent::~GpuAgent() () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#15 0x00007ffff7e76764 in void DeleteObject::operator()<core::Agent>(core::Agent const*) const () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#16 0x00007ffff7e736a4 in DeleteObject std::for_each<__gnu_cxx::__normal_iterator<core::Agent**, std::vector<core::Agent*, std::allocator<core::Agent*> > >, DeleteObject>(__gnu_cxx::__normal_iterator<core::Agent**, std::vector<core::Agent*, std::allocator<core::Agent*> > >, __gnu_cxx::__normal_iterator<core::Agent**, std::vector<core::Agent*, std::allocator<core
::Agent*> > >, DeleteObject) () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#17 0x00007ffff7e6dc83 in core::Runtime::Unload() () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#18 0x00007ffff7e683a3 in core::Runtime::Release() () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#19 0x00007ffff7e40452 in HSA::hsa_shut_down() () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#20 0x00007ffff7e8af92 in hsa_shut_down () from /usr/lib64/libhsa-runtime64.so.1
No symbol table info available.
#21 0x000055555555c931 in main (argc=1, argv=0x7fffffffd9f8) at /tmp/portage/dev-util/rocminfo-3.5.0/work/rocminfo-rocm-3.5.0/rocminfo.cc:1167
        err = HSA_STATUS_SUCCESS
        sys_info = {major = 1, minor = 1, timestamp_frequency = 1000000000, max_wait = 18446744073709551615, endianness = HSA_ENDIANNESS_LITTLE, machine_model = HSA_MACHINE_MODEL_LARGE}
        agent_ind = 3

Sadly debug symbols are missing, since ROCR-Runtime's build system seems to override CXXFLAGS and LDFLAGS when building shared libraries. c.f. https://bugs.gentoo.org/729898

This is reproducible every time I run rocminfo.

Regression

I never got ROCm to work on this system. Still working on it. :)

Logs

dmesg prints during execution of rocminfo:

[Sat Jul 11 22:49:59 2020] Alloc host visible vram on small bar is not allowed
[Sat Jul 11 22:49:59 2020] Evicting PASID 0x8021 queues
[Sat Jul 11 22:49:59 2020] Evicting PASID 0x8021 queues

Other information

I also see exceptions and segfaults in Clover and ROCm's OpenCL implementation when executing clinfo:

  • https://gitlab.freedesktop.org/mesa/mesa/-/issues/3255
  • https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/32

I assume rocminfo is the more low-level command, so I guess first getting that to work without problems might help debugging the OpenCL problems.

devurandom avatar Jul 11 '20 20:07 devurandom