XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

[RVV] add rvv kernel for qs8/qu8-vcvt

Open ken-unger opened this issue 6 months ago • 2 comments

  • Supersedes #7126 which was not committed, however I implemented this similar to neon and other so we can get to u4v.

ken-unger avatar May 26 '25 15:05 ken-unger

@fbarchard and @dsharlet please review and commit once approved. Thank you.

  • Test cases pass
  • Benchmark snip (BananaPI-F3 K1, VLEN=256)

./vunary-bench --benchmark_filter=xnn_qs8_vcvt

vunary/xnn_qs8_vcvt_ukernel__rvv_u1v/N:16320/real_time bytes=5.33456G/s cpufreq=1.6G elements=2.66728G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u1v/N:130560/real_time bytes=5.11528G/s cpufreq=1.6G elements=2.55764G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u2v/N:16320/real_time bytes=6.32932G/s cpufreq=1.6G elements=3.16466G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u2v/N:130560/real_time bytes=5.3969G/s cpufreq=1.6G elements=2.69845G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u4v/N:16320/real_time bytes=7.0579G/s cpufreq=1.6G elements=3.52895G/s vunary/xnn_qs8_vcvt_ukernel__rvv_u4v/N:130560/real_time bytes=4.00234G/s cpufreq=1.6G elements=2.00117G/s

vunary/xnn_qs8_vcvt_ukernel__scalar_u1/N:16320/real_time bytes=159.383M/s cpufreq=1.6G elements=79.6916M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u1/N:130560/real_time bytes=159.387M/s cpufreq=1.6G elements=79.6933M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u2/N:16320/real_time bytes=212.48M/s cpufreq=1.6G elements=106.24M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u2/N:130560/real_time bytes=211.578M/s cpufreq=1.6G elements=105.789M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u4/N:16320/real_time bytes=219.807M/s cpufreq=1.6G elements=109.903M/s vunary/xnn_qs8_vcvt_ukernel__scalar_u4/N:130560/real_time bytes=219.077M/s cpufreq=1.6G elements=109.538M/s

ken-unger avatar May 26 '25 15:05 ken-unger

This branch has some failing relevant tests: https://github.com/google/XNNPACK/actions/runs/15284031029/job/42989783184?pr=8497

Some of them are new after the merge, my apologies for the conflict if so.

However, one of them was failing before the conflict too (unary-test).

dsharlet avatar May 27 '25 19:05 dsharlet