rvv-bench icon indicating copy to clipboard operation
rvv-bench copied to clipboard

work together to benchmark the K230?

Open mr-c opened this issue 1 year ago • 19 comments

I think we are in the same timezone? I received mine yesterday. Feel free to DM me on Fediverse (link in my profile) or michael.crusoe@fu-berlin.de

mr-c avatar Nov 10 '23 14:11 mr-c

Yes we are, thanks for the offer, I've received mine yesterday as well.

I started a benchmark run overnight, but got a bit greedy with the sample count, so it only finished two. I've got the rest running rn with fewer samples, they should be done in about an hour.

Do you have any suggestions what else to add to the benchmark?

camel-cdr avatar Nov 10 '23 15:11 camel-cdr

I think there are some other vector tests; and the recently published risc-v CPU fuzzer would be interesting

Are you compiling using either of their toolchains, or latest gcc/clang?

mr-c avatar Nov 10 '23 15:11 mr-c

I'm compiling with clang-16, do you have a latest gcc or clang compiler ready?

Since we have proper auto vectorization now, I was planning to use different compilers.

If you have a newer compiler, you could modify #define IMPLS(f) in the bench/*.c files to exclude all the f(rvv_*) benchmarks, as those are written in assembly and run a quicker run with only the scalar and autovec benchmarks.

With clang I'm using the following config.mk:

WARN=-Wall -Wextra -Wno-unused-function -Wno-unused-parameter
CC=clang
CFLAGS=--target=riscv64 -march=rv64gcv_zfh_zba_zbb_zbs -O3 ${WARN} -nostdlib -fno-builtin -ffreestanding

If you have a full toolchain, then you don't need to do a freestanding build and can get rid of the part after ${WARN}.

I'm using the following bench/config.h, you should use the same so I can manually combine the json output.

/* processor specific configs */
#define HAS_E64 1
#define HAS_F16 1

/* the maximum number of bytes to allocate, minimum of 4096 */
#define MAX_MEM (1024*1024*16)
/* the byte count for the next run */
#define NEXT(c) (c + c/17 + 3)

/* minimum number of repeats, to sample median from */
#define MIN_REPEATS 64
/* maxium number of repeats, executed until more than STOP_TIME has elapsed */
#define MAX_REPEATS 128

/* stop repeats early afer this many cycles have elapsed */
#define STOP_CYCLES (1024*1024*500)


/* custom scaling factors for benchmarks, these are used to make sure each
 * benchmark approximately takes the same amount of time. */

#define SCALE_mandelbrot(N) ((N)/10)
#define SCALE_mergelines(N) ((N)/10)

/* benchmark specific configurations */
#define mandelbrot_ITER 100

Once this is setup, you should be able to build the benchmarks in bench with make and copy over and run the executable on the device.

BTW: I'm using this patch of the k230_sdk, which just gives you one linux install with 0.5Gig memory on the rvv capable core: https://github.com/negge/k230_sdk/tree/build_fixes

Edit:

I changed the following lines of the config.h for the longer memcpy and memset run:

/* the maximum number of bytes to allocate, minimum of 4096 */
#define MAX_MEM (1024*1024*256)
/* the byte count for the next run */
#define NEXT(c) (c + c/17 + 3)

Edit2: It looks like ascii_to_* takes longer than I expected. It will probably finish in 2 hours.

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

I've got clang-18 (from the experimental llvm-toolchain-snapshot) 18.0.0 (++20231102103655+18839aec4ed1-1~exp1) and gcc-13 13.2.0-6.

for gcc-snapshot, alas the cross-builders are not included, so I'll have to either user qemu to run the riscv binary, or install it on the k230 and self-compile

mr-c avatar Nov 10 '23 16:11 mr-c

clang-18 would be good to have results for, from a quick check on godbolt, the autovectorization definitely improved, even for simple things like memcpy.

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

About your -march, from my notes investigating the two toolchains in the SDK :

# according to the Xuantie-900 toolchain, mcpu=c908v is equal to
# -march=rv64imafdcv_zihintpause_zfh_zba_zbb_zbc_zbs_zvamo_zvlsseg_xtheadc_xtheadvdot
# See `k230_sdk/toolchain/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.0/bin/riscv64-unknown-linux-gnu-gcc -mcpu=c908v -Q --help=target | grep -- -march`

# according to the riscv64-linux-musleabi toolchain, mcpu=c908v is equal to
# -march=rv64imafdcv_zicsr_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b

mr-c avatar Nov 10 '23 16:11 mr-c

If you use the qemu from https://github.com/T-head-Semi/qemu then you can pass -cpu c908v :-)

mr-c avatar Nov 10 '23 16:11 mr-c

The flags I used should be fine.

zvlsseg doesn't seem to be distinguished in clang.

zihintpause, zbc, zvamo shouldn't be relevant.

xtheadc, xtheadvdot are vendor extensions and probably not relevant (they won't be targeted by software distributions).

The zvl* flags should be irelevant, as they are required by V, except for zfh (I think so atleast).

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

Hosted mode isn't happy

michael:~/src/rvv-bench/bench$ make
clang-18 --target=riscv64 -march=rv64gcv_zfh_zba_zbb_zbs -O3 -Wall -Wextra -Wno-unused-function -Wno-unused-parameter   -o memcpy memcpy.c -DINC=memcpy.S template.S
In file included from memcpy.c:1:
In file included from ./bench.h:2:
./../nolibc.h:13:10: fatal error: 'string.h' file not found
   13 | #include <string.h>
      |          ^~~~~~~~~~
1 error generated.

Nor is none hosted mode

michael:~/src/rvv-bench/bench$ make
clang-18 --target=riscv64 -march=rv64gcv_zfh_zba_zbb_zbs -O3 -Wall -Wextra -Wno-unused-function -Wno-unused-parameter  -nostdlib -fno-builtin -ffreestanding -o memcpy memcpy.c -DINC=memcpy.S template.S
ld.lld: error: /tmp/template-4e1e48.o:(.text+0x0): relocation R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with -mno-relax
clang-18: error: ld.lld command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:10: memcpy] Error 1

mr-c avatar Nov 10 '23 16:11 mr-c

Hosted mode with --target without a toolchain isn't really hosted, I didn't anticipate that.

The second one works for me when I use clang-16, does -mno-relax help? This might be caused by the .balign 8 in template.S, does it work with that removed?

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

-mno-relax worked; copying over the files now (do you have networking setup yet?)

mr-c avatar Nov 10 '23 16:11 mr-c

Great. I got it connected to ethernet and tried to build (to install) git. I failed at the download, because the preinstalled wget doesn't support https, only http or ftp.

Then I gave that up and now I'm just using screen on the serial port and rz&sz to send over files.

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

I can upload my debian "sid" riscv64 root, if you want :-) Or you can run sudo debootstrap --arch riscv64 sid /path/to/temp_root http://deb.debian.org/debian/ and then combine it yourself

mr-c avatar Nov 10 '23 16:11 mr-c

It's fine, this setup works quite well for my purposes.

camel-cdr avatar Nov 10 '23 16:11 camel-cdr

Initial results with clang-18 (heading home now; will continue there)

{
title: "memcpy",
labels: ["0","libc","musl","scalar","scalar_autovec",],
data: [
[1,4,7,10,13,16,19,23,27,31,35,40,45,50,55,61,67,73,80,87,95,103,112,121,131,141,152,163,175,188,202,216,231,247,264,282,301,321,342,365,389,414,441,469,499,531,565,601,639,
[0.0333333,0.0975609,0.1320754,0.1538461,0.1688311,0.1797752,0.1881188,0.1965811,0.2030075,0.2080536,0.7142857,0.5797101,0.5056179,0.4587155,0.4263565,0.3986928,1.1183393,0,
[0.0242839,0.0695807,0.1148264,0.1761119,0.1906015,0.2231759,0.2861003,0.3116206,0.3859970,0.4156781,0.3415915,0.2829675,0.3232639,0.3394551,0.3107794,0.3629567,0.3894768,0,
[0.0555555,0.1250000,0.1627906,0.1923076,0.2131147,0.2285714,0.2405063,0.2527472,0.2621359,0.2695652,0.2755905,0.2816901,0.2866242,0.2906976,0.2941176,0.2975609,0.3004484,0,
[0.0333333,0.0909090,0.1186440,0.1351351,0.1460674,0.1538461,0.1596638,0.1654676,0.1698113,0.1731843,0.7000000,0.5333333,0.4500000,0.4000000,0.3666666,0.3388888,1.1166666,0,
]
},
{
title: "memcpy aligned",
labels: ["0","libc","musl","scalar","scalar_autovec",],
data: [
[1,4,7,10,13,16,19,23,27,31,35,40,45,50,55,61,67,73,80,87,95,103,112,121,131,141,152,163,175,188,202,216,231,247,264,282,301,321,342,365,389,414,441,469,499,531,565,601,639,
[0.0333333,0.0930232,0.1320754,0.1538461,0.1688311,0.1797752,0.1881188,0.1965811,0.2030075,0.2080536,0.7291666,0.5882352,0.5113636,0.4629629,0.4296875,0.4013157,1.1551724,0,
[0.0277777,0.0769230,0.1250000,0.1538461,0.2063492,0.2539682,0.3275862,0.3709677,0.4218750,0.4492753,0.3398058,0.3333333,0.3719008,0.3759398,0.3666666,0.4178082,0.4213836,0,
[0.0555555,0.1250000,0.1627906,0.1923076,0.2131147,0.2285714,0.2405063,0.2527472,0.2621359,0.2695652,0.2755905,0.2816901,0.2866242,0.2906976,0.2941176,0.2975609,0.3004484,0,
[0.0333333,0.0909090,0.1186440,0.1351351,0.1460674,0.1538461,0.1596638,0.1654676,0.1698113,0.1731843,0.7142857,0.5405405,0.4545454,0.4032258,0.3691275,0.3407821,1.1551724,0,
]
},
{
title: "memset",
labels: ["0","libc","musl","scalar","scalar_autovec",],
data: [
[1,4,7,10,13,16,19,23,27,31,35,40,45,50,55,61,67,73,80,87,95,103,112,121,131,141,152,163,175,188,202,216,231,247,264,282,301,321,342,365,389,414,441,469,499,531,565,601,639,
[0.0344827,0.1081081,0.1581233,0.1960784,0.2166666,0.2424242,0.2533333,0.2705882,0.2842105,0.2952380,0.7291666,0.6506777,0.6164383,0.5747126,0.5612244,0.5398230,1.2884615,1,
[0.0588235,0.1904761,0.3043478,0.2564102,0.2722147,0.3636363,0.4318181,0.5227272,0.6136363,0.5735294,0.6456953,0.7398624,0.8299834,0.9226401,1.0153846,0.8072616,0.8799461,0,
[0.0588235,0.1739130,0.2333333,0.2777777,0.3095238,0.3333333,0.3518518,0.3709677,0.3857142,0.3974358,0.4069767,0.4166666,0.4245283,0.4310344,0.4365079,0.4420289,0.4466666,0,
[0.0344827,0.0975609,0.1250000,0.1724137,0.1940298,0.2105263,0.2235294,0.2371134,0.2477064,0.2561983,0.7291666,0.6247496,0.5555555,0.5208333,0.4954954,0.4728682,1.2884615,1,
]
},
{
title: "memset aligned",
labels: ["0","libc","musl","scalar","scalar_autovec",],
data: [
[1,4,7,10,13,16,19,23,27,31,35,40,45,50,55,61,67,73,80,87,95,103,112,121,131,141,152,163,175,188,202,216,231,247,264,282,301,321,342,365,389,414,441,469,499,531,565,601,639,
[0.0344827,0.1081081,0.1580775,0.1587301,0.2166666,0.2424242,0.2533333,0.2705882,0.2842105,0.2952380,0.7446808,0.6608769,0.6250000,0.5813953,0.5670103,0.5446428,1.3400000,1,
[0.0588235,0.1904761,0.3043478,0.2564102,0.2954545,0.3636363,0.4318181,0.5227272,0.6136363,0.5740740,0.6481481,0.7407407,0.8333333,0.9259259,1.0185185,0.8026315,0.8815789,0,
[0.0588235,0.1701199,0.2333333,0.2777777,0.3095238,0.3333333,0.3518518,0.3709677,0.3857142,0.3974358,0.4069767,0.4166666,0.4245283,0.4310344,0.4365079,0.4420289,0.4466666,0,
[0.0344827,0.0975609,0.1428571,0.1724137,0.1940298,0.2105263,0.2235294,0.2371134,0.2477064,0.2561983,0.7446808,0.6333739,0.5625000,0.5263157,0.5000000,0.4765625,1.3400000,1,
]
}

mr-c avatar Nov 10 '23 17:11 mr-c

My results are up: https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html

camel-cdr avatar Nov 10 '23 19:11 camel-cdr

clang-18-results.zip

mr-c avatar Nov 10 '23 19:11 mr-c

Thanks, I've added all of the autovec results, the others were identical anyways.

All autovectorizations except for utf8_count_SWAR_popc_autovec where an improvement over clang 16, the biggest one was in byteswap.

Edit: I've credited you in the overview, and added performance observations.

camel-cdr avatar Nov 10 '23 20:11 camel-cdr

I've got clang-18 (from the experimental llvm-toolchain-snapshot) 18.0.0 (++20231102103655+18839aec4ed1-1~exp1)

More notes on the setup using clang to cross-compile for this board from x86-64:

Here are my CFLAGS/CXXFLAGS: --target=riscv64-linux-gnu -isystem=/usr/riscv64-linux-gnu/include -march=rv64imafdcv_zihintpause_zfh_zba_zbb_zbc_zbs_zicsr_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b -O3

and my C linker flags: --target=riscv64-linux-gnu -static -static-libgcc ; for C++ add -static-libstdc++

The exact clang version I used is at https://snapshot.debian.org/package/llvm-toolchain-snapshot/1%3A18~%2B%2B20231102103655%2B18839aec4ed1-1~exp1/ ; but the newer releases probably work. To use them, enable the Debian experimental sources on your system or in a Docker container and sudo apt install clang-18.

You may also need packages like libc6-dev-riscv64-cross and linux-libc-dev-riscv64-cross installed as well.

mr-c avatar Dec 22 '23 09:12 mr-c