Camel Coder

Results 64 comments of Camel Coder

My results are up: https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html

Thanks, I've added all of the autovec results, the others were identical anyways. All autovectorizations except for [utf8_count_SWAR_popc_autovec](https://camel-cdr.github.io/rvv-bench-results/canmv_k230/utf8_count.html) where an improvement over clang 16, the biggest one was in [byteswap](https://camel-cdr.github.io/rvv-bench-results/canmv_k230/byteswap.html)....

Ah, you beat me to it :-) I was planning to start on that as well. For encode I was planning to do something similar to the haswell implementation, that's...

> You have to take into account that we need to store data in the big-endian order I thought you could do the endianess swap with the initial vrgather, but...

@WojciechMula I tried rearranging the shifts to create the big endian result, that could be compressed: ```c // in32: [00dddDDD|00cccCCC|00bbbBBB|00aaaAAA] // d: [00dddDDD|00000000|00000000|00000000] i& // c1: [0000dddD|DD00cccC|000000bb|bBBB00aa] i>>2,4 (i as...

@ppassmannpriv I know how to write a PR, but I don't know how to properly add multiple languages to the website, since I haven't done any web development. Once that...

@WojciechMula Here you go: C920: ```bash $ ./decode input size: 33554432 number of iterations: 100 We report the time in cycles per output byte. For reference, we present the time...

I've written another decode implementation using segmented load/stores and the three 4-bit LUT method: https://godbolt.org/z/7qc1xhMao It's the fastest on the C908: ``` input size: 4194304 number of iterations: 100 We...

@wanghuibin0 I don't think you'll be able to find something like that until rvv 1.0 CPUs are widely available. This isn't quite what you are looking for, but I've created...

Looks like the kernel doesn't expose `rdcycle`. I think that was changed in recent kernels, and I have to look into how to best access it via the perf api....