composable_kernel
composable_kernel copied to clipboard
[Issue]: Flash Attention Failure on AMD Mi50
Problem Description
I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).
Trying to run flash attention on the Mi50 delivered the following error: RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem
How hard would it be to ensure CK works against the Mi50? Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.
Note - I was directed to CK from here: https://github.com/ROCmSoftwarePlatform/flash-attention/issues/29
Operating System
Ubuntu 22.04
CPU
AMD EPYC 7351 16-Core Processor
GPU
AMD Instinct MI50
Other
No response
ROCm Version
ROCm 5.7.1
ROCm Component
No response
Steps to Reproduce
follow the steps in this dockerfile but change AMD GPU targets for the various makefiles to gfx906 for AMD Mi50.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 1
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 3
Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 2
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 2
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 4
Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 3
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 3
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 5
Name: gfx906
Uuid: GPU-a20c80e172fd5d44
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 4
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 1792
Internal Node ID: 4
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 6
Name: gfx906
Uuid: GPU-68da30417337ecda
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 5
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 2560
Internal Node ID: 5
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 7
Name: gfx906
Uuid: GPU-2492786172fc1a88
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 6
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 10752
Internal Node ID: 6
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 8
Name: gfx906
Uuid: GPU-bc5e39417337ecd7
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 7
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 11520
Internal Node ID: 7
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 9
Name: gfx906
Uuid: GPU-65d6404172e6212c
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 8
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 17664
Internal Node ID: 8
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 10
Name: gfx906
Uuid: GPU-1cf2390172dc76bd
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 9
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 18432
Internal Node ID: 9
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 11
Name: gfx906
Uuid: GPU-aae6492172e6212c
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 10
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 25856
Internal Node ID: 10
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Agent 12
Name: gfx906
Uuid: GPU-4c24504172e6268f
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 11
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 26624
Internal Node ID: 11
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
@ThePerfectComputer Thanks for being interested in our Flash Attention. Our Flash Attention is implemented for MI100 and later DC GPUs. MI50, which lacks of AMD matrix cores (mfma), cannot provide good enough performance.
You may be interested in our Flash Attention works on Navi3x: https://github.com/ROCm/composable_kernel/discussions/1032
@ThePerfectComputer Have you been able to find a solution for this?
Problem Description 问题描述
I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).我能够为我的 Mi100 和 Mi50 卡构建 flash-attention ROCM,但只在 Mi100 上实现 flash-attention(我可能会补充非常令人印象深刻的性能)。
Trying to run flash attention on the Mi50 delivered the following error:尝试在 Mi50 上运行 Flash Attention 时出现以下错误: RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> 不支持此问题
How hard would it be to ensure CK works against the Mi50? Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.确保 CK 能够对抗 Mi50 有多难?很高兴支付/雇用这方面的支持,因为我有相当大的 Mi50 库存。
Note - I was directed to CK from here: ROCm/flash-attention#29注意 - 我从这里被定向到 CK: ROCm/flash-attention#29
Operating System 操作系统
Ubuntu 22.04 乌班图22.04
CPU 中央处理器
AMD EPYC 7351 16-Core ProcessorAMD EPYC 7351 16 核处理器
GPU 图形处理器
AMD Instinct MI50 AMD 本能 MI50
Other 其他
No response 没有反应
ROCm Version ROCm版本
ROCm 5.7.1
ROCm Component ROCm 成分
No response 没有反应
Steps to Reproduce 重现步骤
follow the steps in this dockerfile but change AMD GPU targets for the various makefiles to gfx906 for AMD Mi50.按照此 dockerfile 中的步骤操作,但将各种 makefile 的 AMD GPU 目标更改为 AMD Mi50 的 gfx906。
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
(Linux 用户可选)/opt/rocm/bin/rocminfo --support 的输出
ROCk module is loaded
ROCK 模块已加载
HSA System Attributes HSA 系统属性
Runtime Version: 1.1 运行时版本:1.1 System Timestamp Freq.: 1000.000000MHz 系统时间戳频率:1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 签名。最大等待持续时间:18446744073709551615 (0xFFFFFFFFFFFFFFFF)(时间戳计数) Machine Model: LARGE 机器型号: 大型 System Endianness: LITTLE系统字节顺序:LITTLE Mwaitx: DISABLED Mwaitx:已禁用 DMAbuf Support: YES DMAbuf 支持:是
==========
HSA Agents ========== HSA 代理 Agent 1 代理人1
Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU 供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI 队列类型:多 Node: 0 节点:0 Device Type: CPU 设备类型:CPU Cache Info: 缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0) 芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0 BDID: 0 Internal Node ID: 0 内部节点 ID:0 Compute Unit: 8 计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0 着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None 特点:无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:
Agent 2 特工2
Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU 供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI 队列类型:多 Node: 1 节点:1 Device Type: CPU 设备类型:CPU Cache Info: 缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0) 芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0 BDID: 0 Internal Node ID: 1 内部节点ID:1 Compute Unit: 8 计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0 着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None 特点:无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:
Agent 3 特工3
Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU 供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI 队列类型:多 Node: 2 节点:2 Device Type: CPU 设备类型:CPU Cache Info: 缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0) 芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0 BDID: 0 Internal Node ID: 2 内部节点ID:2 Compute Unit: 8 计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0 着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None 特点:无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:
Agent 4 特工4
Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU 供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI 队列类型:多 Node: 3 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2400 BDFID: 0 Internal Node ID: 3 Compute Unit: 8 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:
Agent 5
Name: gfx906 Uuid: GPU-a20c80e172fd5d44 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 4 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 1792 Internal Node ID: 4 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Agent 6
Name: gfx906 Uuid: GPU-68da30417337ecda Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 5 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 2560 Internal Node ID: 5 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Agent 7
Name: gfx906 Uuid: GPU-2492786172fc1a88 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 6 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 10752 Internal Node ID: 6 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Agent 8
Name: gfx906 Uuid: GPU-bc5e39417337ecd7 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 7 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 11520 Internal Node ID: 7 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Agent 9
Name: gfx906 Uuid: GPU-65d6404172e6212c Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 8 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 17664 Internal Node ID: 8 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Agent 10
Name: gfx906 Uuid: GPU-1cf2390172dc76bd Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI 队列类型:多 Node: 9 节点:9 Device Type: GPU 设备类型:GPU Cache Info: 缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB L2:8192(0x2000)KB Chip ID: 26287(0x66af) 芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 18432 BDID:18432 Internal Node ID: 9 内部节点ID:9 Compute Unit: 60 计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4 着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB 大小:64(0x40) KB Allocatable: FALSE 可分配:FALSE Alloc Granule: 0KB 分配颗粒:0KB Alloc Alignment: 0KB 分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE 快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32
Agent 11 特工11
Name: gfx906 名称:gfx906 Uuid: GPU-aae6492172e6212cUuid:GPU-aae6492172e6212c Marketing Name: AMD Radeon VII 营销名称:AMD Radeon VII Vendor Name: AMD 供应商名称:AMD Feature: KERNEL_DISPATCH 功能:KERNEL_DISPATCH Profile: BASE_PROFILE 配置文件:BASE_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 128(0x80) 最大队列数:128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI 队列类型:多 Node: 10 节点:10 Device Type: GPU 设备类型:GPU Cache Info: 缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB L2:8192(0x2000)KB Chip ID: 26287(0x66af) 芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 25856 BDID:25856 Internal Node ID: 10 内部节点ID:10 Compute Unit: 60 计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4 着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB 大小:64(0x40) KB Allocatable: FALSE 可分配:FALSE Alloc Granule: 0KB 分配颗粒:0KB Alloc Alignment: 0KB 分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE 快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32
Agent 12 特工12
Name: gfx906 名称:gfx906 Uuid: GPU-4c24504172e6268fUuid:GPU-4c24504172e6268f Marketing Name: AMD Radeon VII 营销名称:AMD Radeon VII Vendor Name: AMD 供应商名称:AMD Feature: KERNEL_DISPATCH 功能:KERNEL_DISPATCH Profile: BASE_PROFILE 配置文件:BASE_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 128(0x80) 最大队列数:128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI 队列类型:多 Node: 11 节点:11 Device Type: GPU 设备类型:GPU Cache Info: 缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB L2:8192(0x2000)KB Chip ID: 26287(0x66af) 芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 26624 BDID:26624 Internal Node ID: 11 内部节点ID:11 Compute Unit: 60 计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4 着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1 池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE 可分配:TRUE Alloc Granule: 4KB 分配粒度:4KB Alloc Alignment: 4KB 分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB 大小:64(0x40) KB Allocatable: FALSE 可分配:FALSE Alloc Granule: 0KB 分配颗粒:0KB Alloc Alignment: 0KB 分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE 快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400) 1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32 *** Done *** *** 完毕 ***
Additional Information 附加信息
No response 没有反应
hi, did you solve this problem? no matter what Mi50 is super cheap~ if you did solve this , I can send you some cards for free,haha.
or more directly ,can you help building vllm in mi50