clpeak
clpeak copied to clipboard
New hardware results: Rdna3 7900xt/x ,geforce 4090/4080/4070 and Intel Arc A770 results?
Hi,
Title says it all..
Wanting to see results of new Nv 40x0 series, Amd rdna3 and intel dg2..
hope people with needed hardware can submit them..
Thanks..
My results with 6.2.1 kernel for Arc A770:
Platform: Intel(R) OpenCL HD Graphics
Device: Intel(R) Graphics [0x56a0]
Driver version : 22.49.25018.24 (Linux x64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 397.92
float2 : 403.43
float4 : 407.01
float8 : 417.52
float16 : 421.01
Single-precision compute (GFLOPS)
float : 13018.01
float2 : 11137.58
float4 : 10403.04
float8 : 10026.99
float16 : 9701.60
Half-precision compute (GFLOPS)
half : 19552.90
half2 : 19493.52
half4 : 19526.21
half8 : 19459.81
half16 : 19340.77
No double precision support! Skipped
Integer compute (GIOPS)
int : 4765.67
int2 : 4773.43
int4 : 4789.65
int8 : 4644.51
int16 : 5455.67
Integer compute Fast 24bit (GIOPS)
int : 4755.75
int2 : 4768.87
int4 : 4786.68
int8 : 4642.19
int16 : 5455.34
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 2.64
enqueueReadBuffer : 2.43
enqueueWriteBuffer non-blocking : 2.85
enqueueReadBuffer non-blocking : 2.63
enqueueMapBuffer(for read) : 2.83
memcpy from mapped ptr : 14.38
enqueueUnmap(after write) : 2.91
memcpy to mapped ptr : 14.01
Kernel launch latency : 36.30 us
@al42and nice.. thanks for sharing.. would be nice to have Windows results also to see they not diverge much if you have Windows installed also..
Don't have Windows :(
Kernel latency seems worse on Windows.
Platform: Intel(R) OpenCL HD
Graphics Device: Intel(R) Arc(TM) A770
Graphics Driver version : 31.0.101.4255 (Win64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 396.30
float2 : 403.57
float4 : 409.15
float8 : 419.49
float16 : 423.01
Single-precision compute (GFLOPS)
float : 13346.34
float2 : 11416.61
float4 : 10663.24
float8 : 10299.98
float16 : 9975.71
Half-precision compute (GFLOPS)
half : 20033.96
half2 : 19979.07
half4 : 19969.53
half8 : 19922.98
half16 : 19841.67
No double precision support! Skipped
Integer compute (GIOPS)
int : 4830.21
int2 : 4857.29
int4 : 4846.14
int8 : 4724.30
int16 : 5532.68
Integer compute Fast 24bit (GIOPS)
int : 4824.44
int2 : 4850.69
int4 : 4829.88
int8 : 4694.66
int16 : 5510.71
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 11.21
enqueueReadBuffer : 5.33
enqueueWriteBuffer non-blocking : 15.99
enqueueReadBuffer non-blocking : 6.21
enqueueMapBuffer(for read) : 19.14
memcpy from mapped ptr : 19.38
enqueueUnmap(after write) : 17.15
memcpy to mapped ptr : 19.76
Kernel launch latency : 78.90 us
kernel 5.17.0-1020-oem
and intel-i915-dkms 1.23.3.19.230122.18.5.17.0.1020+i38-1
but bandwidth capped with PCI 3.0
Platform: Intel(R) OpenCL HD Graphics
Device: Intel(R) Arc(TM) A770 Graphics
Driver version : 23.05.25593.18 (Linux x64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 399.42
float2 : 403.78
float4 : 408.53
float8 : 418.51
float16 : 422.97
Single-precision compute (GFLOPS)
float : 13000.09
float2 : 11134.71
float4 : 10402.13
float8 : 10024.48
float16 : 9706.12
Half-precision compute (GFLOPS)
half : 19552.26
half2 : 19500.15
half4 : 19505.83
half8 : 19463.29
half16 : 19341.72
No double precision support! Skipped
Integer compute (GIOPS)
int : 4311.91
int2 : 4322.29
int4 : 4339.57
int8 : 4212.78
int16 : 4920.77
Integer compute Fast 24bit (GIOPS)
int : 4307.33
int2 : 4327.73
int4 : 4341.63
int8 : 4203.23
int16 : 4906.83
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 9.47
enqueueReadBuffer : 4.50
enqueueWriteBuffer non-blocking : 11.07
enqueueReadBuffer non-blocking : 4.86
enqueueMapBuffer(for read) : 10.10
memcpy from mapped ptr : 4.80
enqueueUnmap(after write) : 11.38
memcpy to mapped ptr : 15.45
Kernel launch latency : 9.05 us