ROCm icon indicating copy to clipboard operation
ROCm copied to clipboard

[Issue]: Bug with Triton Kernels in ROCm 6.1

Open Hprairie opened this issue 1 year ago • 1 comments

Problem Description

After compiling a triton kernel it will run and work. However, if I create a kernel with the same name in another python file, then I will get this error.

Traceback (most recent call last):
  File "/home/prairie/Projects/triton-notebooks/basic_mvm.py", line 42, in <module>
    triton_add_s[(1,)](x, y, z, 8, 4)
  File "<string>", line 65, in triton_add_s
  File "/home/prairie/anaconda3/envs/mambaX/lib/python3.11/site-packages/triton/compiler/compiler.py", line 613, in __getattribute__
    self._init_handles()
  File "/home/prairie/anaconda3/envs/mambaX/lib/python3.11/site-packages/triton/compiler/compiler.py", line 604, in _init_handles
    mod, func, n_regs, n_spills = fn_load_binary(self.metadata["name"], self.asm[bin_path], self.shared, device)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: <built-in function load_binary> returned NULL without setting an exception

Luckily just changing the name of the kernel works, i.e. changing the name from triton_add_s to triton_add_sq will fix the issue. Nonetheless, it's a little annoying to change the name of the kernel whenever this happens.

I could be wrong, but the problem also seems to stem from using a .ipynb file to run triton kernels.

Operating System

Ubuntu 22.04.4 LTS (Jammy Jellyfish)

CPU

AMD Ryzen Threadripper 3970X 32-Core Processor

GPU

AMD Radeon RX 7900 XTX

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

It doesn't happen every time, however, it does happen a lot.

  1. Create a triton kernel and run it
  2. Create a new .py file with the same kernel and attempt to run it

Try with .ipynb if not appearing.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module version 6.7.0 is loaded

HSA System Attributes

Runtime Version: 1.13 Runtime Ext Version: 1.4 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name: AMD Ryzen Threadripper 3970X 32-Core Processor Uuid: CPU-XX
Marketing Name: AMD Ryzen Threadripper 3970X 32-Core Processor Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3700
BDFID: 0
Internal Node ID: 0
Compute Unit: 64
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 49175364(0x2ee5b44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 49175364(0x2ee5b44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 49175364(0x2ee5b44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1100
Uuid: GPU-50b29228c5e4eb8b
Marketing Name: Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2304
BDFID: 8960
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 92
SDMA engine uCode:: 20
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

No response

Hprairie avatar May 15 '24 14:05 Hprairie

rm -r ~/.triton/cache should do the trick.

evshiron avatar Jul 06 '24 13:07 evshiron

@Hprairie Has this issue been resolved for you? If so, please close the ticket. Thanks!

ppanchad-amd avatar Jul 17 '24 18:07 ppanchad-amd

Yes this worked, thanks!

Hprairie avatar Jul 17 '24 20:07 Hprairie