riscv-v-spec icon indicating copy to clipboard operation
riscv-v-spec copied to clipboard

Left shift with saturation

Open HanKuanChen opened this issue 4 years ago • 7 comments

Current spec requires 4 vector instructions to implement a left shift with saturation.

# v0 is data
# v1 is shift
# a0 is vl

vsetvli x0, a0, e32, m1
vmv.v.i v2, 1
vsll.vv v3, v2, v1
vwmulsu.vv v4, v0, v3
vnclip.wv v5, v4, 0

However, with widening instructions, this method cannot work with SEW 64 for most hardwares. If we have a specified vector instruction like the following

vssll.vv vd, vs2, vs1, vm   # vd[i] = clip(vs2[i], vs1[i])
vssll.vx vd, vs2, rs1, vm   # vd[i] = clip(vs2[i], x[rs1])
vssll.vi vd, vs2, uimm, vm  # vd[i] = clip(vs2[i], uimm)

vssllu.vv vd, vs2, vs1, vm   # vd[i] = clip(vs2[i], vs1[i])
vssllu.vx vd, vs2, rs1, vm   # vd[i] = clip(vs2[i], x[rs1])
vssllu.vi vd, vs2, uimm, vm  # vd[i] = clip(vs2[i], uimm)

Only 1 instruction is needed.

vsetvli x0, a0, e32, m1
vssll.vv v5, v0, v1

HanKuanChen avatar Aug 28 '20 15:08 HanKuanChen

@aswaterman What's the reason that the spec only has "vssrl" but no left shift with saturation?

We could use widening instruction and vnclip to do the work and the "vxsat" could be set correctly by "vnclip". But this approach is not work for SEW 64 input if the platform doesn't support SEW 128. It's hard to check and setup vxsat correctly.

JerryShih avatar Nov 05 '20 02:11 JerryShih

My recommendation is to use a widening multiply and clip for the cases where SEW < ELEN. The overhead of multiply vs. shift is usually not a concern, since vector units will nearly always provide fully pipelined multipliers.

For the SEW=ELEN case, I think it's totally reasonable to use a multi-instruction sequence (compare against 2^N, perform left shift, and, using the comparison result as a mask, overwrite some elements with -1).

aswaterman avatar Nov 06 '20 04:11 aswaterman

I think it would be a performance issue if SEW equals to ELEN. For example, with SEW and ELEN are 64, the code would be

# v1 is data
# v2 is shift
# a0 is vl

vsevli x0, a0, e64, m1
vsll.vv v3, v1, v2          # input do shift
li a1, 9223372036854775807  # INT64_Max
vmv.v.x v4, a1
vsra.vv v4, v4, v2          # INT64_Max / (2 ^ shift)
vmsgt.vv v0, v1, v4         # overflow if data > (INT64_Max / (2 ^ shift))
vmerge.vvm v3, v3, a1, v0   # INT64_MAX
li a1, -9223372036854775808 # INT64_Min
vmv.v.x v4, a1
vsra.vv v4, v4, v2          # INT64_Min / (2 ^ shift)
vmslt.vv v0, v1, v4         # overflow if data < (INT64_Min / (2 ^ shift))
vmerge.vvm v3, v3, a1, v0   # INT64_MIN

# v3 is result

HanKuanChen avatar Nov 10 '20 05:11 HanKuanChen

Yeah, that's a fairly substantial implementation. Can you give more details how this shows up in applications?

A slight variation is to construct INT64_MIN from INT64_MAX using vnot. This avoids a scalar instruction (or two?) and scalar-to-vector register movement, but consumes another vector register.

You can also do a similar pattern with vmul and vmulh, although that seems at least as costly. (This is just the multi-word arithmetic version of the vwmul+vnclip approach from the case SEW < ELEN.)

I was hoping for a trick involving vsmul but I didn't see it.

I think @aswaterman's earlier comment referenced the unsigned case, which involves roughly half the code.

nick-knight avatar Nov 10 '20 08:11 nick-knight

If I didn't screw it up, I was able to improve on the algorithm a bit (9 -> 6 vector instructions):

vsevli x0, a0, e64, m1
li t0, (1<<63)             # -inf
vsll.vv v3, v1, v2
vsra.vv v4, v3, v2         
vmsne.vv v0, v1, v4        # true if +/- overflow
vmerge.vxm v3, v3, t0, v0  # set to -inf if +/- overflow
vmsge.vi v0, v1, 0, v0.t   # true if +overflow
vnot.v v3, v3, v0.t        # set to +inf if +overflow

aswaterman avatar Nov 11 '20 01:11 aswaterman

It also needs the additional instructions to check and setup the vxsat status. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-saturation-flag-vxsat

The right shift instruction could do the similar things in one instruction without the sew==elen problem. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#134-vector-single-width-scaling-shift-instructions

JerryShih avatar Nov 11 '20 02:11 JerryShih

There should be a later vector extension with greater support for fixed-point operations, and don't want to add more to vector spec before 1.0.

kasanovic avatar Jun 08 '21 15:06 kasanovic