XNNPACK
XNNPACK copied to clipboard
[RVV] add rvv kernel for qs8/qu8-vcvt
- Supersedes #7126 which was not committed, however I implemented this similar to neon and other so we can get to u4v.
@fbarchard and @dsharlet please review and commit once approved. Thank you.
- Test cases pass
- Benchmark snip (BananaPI-F3 K1, VLEN=256)
./vunary-bench --benchmark_filter=xnn_qs8_vcvt
vunary/xnn_qs8_vcvt_ukernel__rvv_u1v/N:16320/real_time bytes=5.33456G/s cpufreq=1.6G elements=2.66728G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u1v/N:130560/real_time bytes=5.11528G/s cpufreq=1.6G elements=2.55764G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u2v/N:16320/real_time bytes=6.32932G/s cpufreq=1.6G elements=3.16466G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u2v/N:130560/real_time bytes=5.3969G/s cpufreq=1.6G elements=2.69845G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u4v/N:16320/real_time bytes=7.0579G/s cpufreq=1.6G elements=3.52895G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u4v/N:130560/real_time bytes=4.00234G/s cpufreq=1.6G elements=2.00117G/s
vunary/xnn_qs8_vcvt_ukernel__scalar_u1/N:16320/real_time bytes=159.383M/s cpufreq=1.6G elements=79.6916M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u1/N:130560/real_time bytes=159.387M/s cpufreq=1.6G elements=79.6933M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u2/N:16320/real_time bytes=212.48M/s cpufreq=1.6G elements=106.24M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u2/N:130560/real_time bytes=211.578M/s cpufreq=1.6G elements=105.789M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u4/N:16320/real_time bytes=219.807M/s cpufreq=1.6G elements=109.903M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u4/N:130560/real_time bytes=219.077M/s cpufreq=1.6G elements=109.538M/s
This branch has some failing relevant tests: https://github.com/google/XNNPACK/actions/runs/15284031029/job/42989783184?pr=8497
Some of them are new after the merge, my apologies for the conflict if so.
However, one of them was failing before the conflict too (unary-test).