Results 4 issues of Han-Kuan Chen

improve speed (memcpy32_speed.c and memcpy64_speed.c) and code size (memcpy_size.c)

I find some code blocks have 1 leading space. But some code blocks have 2, 3 or 4 leading spaces. It is very annoyed when copy & paste the instructions....

Current spec requires 4 vector instructions to implement a left shift with saturation. ``` # v0 is data # v1 is shift # a0 is vl vsetvli x0, a0, e32,...

Resolve after v1.0

For `vsll`, `vsrl` and `vsra`, the output is 0 when shift amount equals to SEW. Use lg2(SEW) to determine shift amount is enough. However, because fixed-point has rounding mode, the...

Resolve after v1.0