ecmascript_simd Again, reconsider unsigned types (TC39 request)

trafficstars

It's a bit redundant with integer types, but TypedArrays have unsigned and signed variants, as well as Uint8Clamped, even though the same data can be represented with signed ints. At TC-39, a committee member asked why we don't have unsigned variants. While the current API design makes sense to me, we should also consider making the SIMD API symmetric with the existing language features. The cost is that there are a large number of functions that are basically duplicated (e.g., add), so I don't know the answer one way or the other, but this bug documents the question. In theory, with value types, it's not actually a big deal to make more types.

Jun 01 '15 20:06 littledan

Would separate signed and unsigned types imply the need for lots of conversions when you wanted to mix signed and unsigned operations?

It's not important right now because I don't think any (many?) operations are supported that behave differently on signed or unsigned data, but once operations like that appear it would be unpleasant having to qualify type conversions all over the place.

Jun 02 '15 00:06 ghost

I guess you could have bitcast operations, like the ones that exist right now between float and int types, right? I'm not sure when you'll be switching back and forth between signed and unsigned operations in a way that's annoying to use though--do you have a specific use case in mind?

Jun 02 '15 23:06 littledan

Well, contrast SSE's universal __m128 C data type with NEON's int8x16_t, uint8x16_t, ..., int32x4_t, uint32x4_t, etc., and the amount of vreinterpretq_xnn_ymm() casts that the latter demands. See examples from skia, webp, and vpx. Some uses of vreinterpret in those are data width aliasing, but I found them by searching for simple signed<->unsigned casts of the same width.

I'm not entirely sure how the implications fit into JavaScript, though. It's just my experience with C that makes me wary of starting something hard to finish. Conversions might not come up much at the moment because most of the operations give the same result if you use the wrong signedness, and 32-bit leaves a lot of room to not care. It's the narrower types and all the saturating/halving/widening/narrowing operations that go with those types where it gets hairy.

As simple example of the implications: should uint32x4.ShiftRightByScalar() be an implicit logical shift, and should sint32x4.ShiftRightLogicalByScalar() be eliminated?

Oh, sorry, there's no such method as ShiftRightByScalar()... so the question is simply wether or not uint32x4.ShiftRightArithmeticByScalar() and sint32x4.ShiftRightLogicalByScalar() make any sense.

Jun 03 '15 20:06 ghost

Well, here are some other cases where it comes up besides shift (maybe you do want to call out shift types with different names because they really are different; I wasn't thinking about shift before):

Construction and extractLane/replaceLane: This simple usage was enough to make TypedArrays decide to have multiple versions for signed/unsigned. With our current API, programmers have to convert negative numbers into positive ones if they want a scalar extraction of an unsigned vector, and this is the only thing that unsigned TypedArrays provide users. You could argue, though, that it was a mistake to do this for TypedArrays.
Comparison operations like greaterThan. The unsigned version can just be implemented in terms of the signed one by adding the right splatted constant first, and then doing the comparison, but the builtin version is signed.
Saturating add: The polyfill saturates at signed min/max. Some use cases might want an unsigned saturation.

Maybe SIMD programmers don't need as much babying as I'm imagining; I'm definitely not suggesting that we add functions for unsigned comparison, unsigned saturation, etc, but maybe if we had types for unsigned SIMD vectors, then it would be easier to see that an operation is signed-only because the function doesn't exist on the unsigned type.

Jun 03 '15 21:06 littledan

We had previously decided to introduce 'unsigned' int operations for when signedness matters (division, shift, and compares). This is how LLVM does it too. However, it's not a nice symmetrical solution like introducing an unsigned base type would be. The ugliness of having to do converts is a big con though. These are verbose and very detrimental to code readability. I think I prefer the 'special' unsigned operations over introducing a new base type, but can go either way.

Jun 09 '15 18:06 PeterJensen

I don't see any signed comparison operators in the polyfill. Is the intention to add them still? (I guess division doesn't matter since there's no integer division operations right now.)

Jun 09 '15 18:06 littledan

Yes, the intention is to add them, unless there's consensus to support unsigned types instead.

Jun 09 '15 19:06 sunfishcode

How about getters named .u and .s to make the conversions less noisy?

Jun 09 '15 21:06 ghost

In the call today, the consensus was to not add unsigned types, but to add the full set of missing unsigned operations. This means:

Unsigned comparisons
Unsigned saturating add and subtract

I guess we don't need unsigned extractLane/replaceLane, do we?

Jun 10 '15 05:06 littledan

Unsigned comparisons, saturating add and subtract, and extractLane are in a patch in #209.

Jul 02 '15 00:07 sunfishcode

@sunfishcode 's patch has been merged, and the spec has been updated in v0.7 1ba0295815 . I don't see any more unsigned operations that are missing.

Jul 06 '15 19:07 littledan

Sorry to reopen this, but Waldemar Horwat of TC39 has expressed that he still wants unsigned types. In particular, even if all the arithmetic operations have unsigned modes, there is still the issue where .toString() always prints out the value as signed integers.

Jul 29 '15 18:07 littledan

We discussed this at length in the meeting. I'm tempted to say that we should make this change to unsigned integer types. There's no real performance downside either way, it matches TypedArray precedent, and it also follows the majority precedent (with xmmintr.h the big huge exception). I looked around at some other SIMD interfaces to see what the consensus is. For a unified type, I found a couple low-level interfaces: xmmintr.h LLVM IR

For separate signed/unsigned types, I found many high-level languages: D Swift Factor Rust AltiVec C types NEON C types GCC platform-independent vector extensions LLVM user-level SIMD interface

On the downside, programmers will sometimes have to use an explicit cast when, based on their xmmintr.h and assembly experience, it'd be annoying to have to use that cast. And the additional duplicated functions (e.g. add) will take some memory (maybe we'll have 30% more functions or so).

This is the only remaining big issue with TC39. The people who will be reviewing the spec in detail, whose signoff we would need, are concerned about this. Does anyone object to making this change?

Jul 30 '15 17:07 littledan

The amount of extra functions that will need to be added is substantial. On Int32x4 there's around 30 or so sign-agnostic operations (arithmetic, logical, shift, load, store, initializers, compares). Assuming we need to have copies of those for the unsigned types as well it will add roughly 3x30 functions. In addition we'll need type casting operations between all the 128-bit types. Now there's 4 128-bit types (Int32x4, Int16x8, Int8x16, Float32x4), resulting in 12 (4x4-4) cast operations. With the addition of 3 more 128-bit types (UInt32x4, UInt16x8, UInt8x16) we'll need 7x7-7 = 42 cast operations total, that's an increase of 40, so all together we'll be adding around ~130 functions to the API.

Daniel did an exact computation and it came out to 256 functions before and 390 after. A ~50% increase.

Jul 30 '15 22:07 PeterJensen

I don't know Javascript at all, but could the functions not be relaxed in the set of types that they accept? Perhaps by implementing sign-agnostic operations under sign-agnostic types and using sign-agnostic-type-casting getters to duck-type the problem away?

Jul 30 '15 23:07 ghost

Also worth keeping in mind: When increasing the size of the API by ~50% we also need 50% more unit tests.

Jul 31 '15 03:07 PeterJensen

@simhos01 Not 100% sure I understand your suggestion. Do you suggest introducing SIMD.XIntAxB name spaces to hang the sign-agnostic operations on. It would be OK to use both IntAxB and UintAxB operands on those operations.

Jul 31 '15 03:07 PeterJensen

Sign-agnostic ops would be a little more difficult for a VM to implement efficiently since they are be polymorphic not only in input type but also in output type most of the time. They would probably have to dispatch to different code paths internally. And, as Peter points out, where do you put them? I don't think it's a very good idea.

In making this decision, I think the main thing is to compare the preponderance of other languages and TypedArrays (and maybe planning for future consistency with int64) as evidence against the 50% increased number of functions, which has implementation complexity and memory overhead burden.

Jul 31 '15 21:07 littledan

By the way, on another thread, Peter made the good point that TypedArrays are already not being taken for everything because of Uint8ClampedArray. But I think that was more of a one-off mistake and the rest of TypedArrays is decent.

Jul 31 '15 21:07 littledan

Thanks @littledan That comment should've been made here. My mistake.

Jul 31 '15 22:07 PeterJensen

@simhos01 's comment above regarding massive use of vreinterpretq_xnn_ymm() in example code from skia, webp, and vpx where vreinterpret calls are used for type casts is pretty important. It provides good evidence that it's not unlikely that type conversions between signed and unsigned SIMD values happens in real code. To convert two signed operands to unsigned in order to do an unsigned compare would result in code that looks like this

SIMD.Uint32x4.lessThan(SIMD.Uint32x4.fromInt32x4(a), SIMD.Uint32x4.fromInt32x4(b))

instead of simply saying:

SIMD.Int32x4.unsignedLessThan(a, b);

The latter being much easier to read.

Having to use a lot of these conversion can quickly result in hard to read (and maintain) code.

Jul 31 '15 22:07 PeterJensen

@PeterJensen I'm not sure I understand the point about vreinterpretq: it seems like it's reinterpreting bits for vectors of the same bit-width, but different element width. e.g. in the webp example:

static WEBP_INLINE void Load4x16(const uint8_t* src, int stride,
                                 uint8x16_t* const p1, uint8x16_t* const p0,
                                 uint8x16_t* const q0, uint8x16_t* const q1) {
  const uint32x4_t zero = vdupq_n_u32(0);
  uint32x4x4_t in;
  INIT_VECTOR4(in, zero, zero, zero, zero);
  src -= 2;
  LOADQ_LANE_32b(in.val[0], 0);
  LOADQ_LANE_32b(in.val[1], 0);
  LOADQ_LANE_32b(in.val[2], 0);
  LOADQ_LANE_32b(in.val[3], 0);
  LOADQ_LANE_32b(in.val[0], 1);
  LOADQ_LANE_32b(in.val[1], 1);
  LOADQ_LANE_32b(in.val[2], 1);
  LOADQ_LANE_32b(in.val[3], 1);
  LOADQ_LANE_32b(in.val[0], 2);
  LOADQ_LANE_32b(in.val[1], 2);
  LOADQ_LANE_32b(in.val[2], 2);
  LOADQ_LANE_32b(in.val[3], 2);
  LOADQ_LANE_32b(in.val[0], 3);
  LOADQ_LANE_32b(in.val[1], 3);
  LOADQ_LANE_32b(in.val[2], 3);
  LOADQ_LANE_32b(in.val[3], 3);
  // Transpose four 4x4 parts:
  {
    const uint8x16x2_t row01 = vtrnq_u8(vreinterpretq_u8_u32(in.val[0]),
                                        vreinterpretq_u8_u32(in.val[1]));
    const uint8x16x2_t row23 = vtrnq_u8(vreinterpretq_u8_u32(in.val[2]),
                                        vreinterpretq_u8_u32(in.val[3]));
    const uint16x8x2_t row02 = vtrnq_u16(vreinterpretq_u16_u8(row01.val[0]),
                                         vreinterpretq_u16_u8(row23.val[0]));
    const uint16x8x2_t row13 = vtrnq_u16(vreinterpretq_u16_u8(row01.val[1]),
                                         vreinterpretq_u16_u8(row23.val[1]));
    *p1 = vreinterpretq_u8_u16(row02.val[0]);
    *p0 = vreinterpretq_u8_u16(row13.val[0]);
    *q0 = vreinterpretq_u8_u16(row02.val[1]);
    *q1 = vreinterpretq_u8_u16(row13.val[1]);
  }
}

This isn't about sign as much as it is about number of elements.

There are some _s8_u8 and _s16_u16 conversions, but the majority AFAICT are just changing the number of elements.

Aug 01 '15 05:08 jfbastien

@jfbastien Yes the conversions that change the number of elements don't count.

My point was that the sign conversions do appear in real code, as witnessed by the _s8_u8 and s16_u16 conversions, and that the ergonomics of the JS code one would have to write to do this is pretty ugly.

Aug 01 '15 16:08 PeterJensen

@PeterJensen This example is a bit hard for me to understand. Do you have any other examples? Maybe some sample JavaScript code with and without the casts would make the point especially clear.

Aug 03 '15 17:08 littledan

The cast in d = SIMD.Int32x4.fromUint32x4(SIMD.Uint32x4.sub(a, b)) is inevitable in many realistic uses of unsigned types. Then you'll have to cast back to produce another unsigned result.

I think this might be an invisible problem in the desktop world because most 32-bit code copes by staying signed and assuming that overflow never happens rather than by being careful (promotion to 64-bit is generally preferred over being careful). 16-bit and 8-bit DSP code comes up against its range limits much more often but signedness clarity is necessary only for a few operations, while enforcing it in the type makes all operations pay the overhead.

Aug 03 '15 22:08 ghost

In both the signed and unsigned cases, overflow of sub is well-defined by two's complement arithmetic. sub would be is identical on signed and unsigned operations except for the names of the types taken as arguments and given as a return value.

Aug 03 '15 22:08 littledan

The problem isn't the data, but its type. (unsigned)x - (unsigned)y might have a negative result, implying that the result should be signed (though it could still be out of range), or it might have a positive result as large as UINT_MAX, meaning that the result needs to be unsigned. The programmer must know some additional constraints or they wouldn't be able to write useful code, but whichever the language chooses by default will be the wrong type in some cases, and then the result will need to be cast to the correct type if it's to be used in a comparison or a saturating operation.

Some might consider a cast necessary simply as a matter of being up-front about the proper interpretation of the data at that stage. If I subtract two unsigned values and store the result in an unsigned type, am I not implying that the result must be positive or zero? Now clarity (say what you mean, use a type appropriate to the meaning of the data) is coming into conflict with brevity (don't make things unnecessarily verbose with extra casts).

Aug 04 '15 00:08 ghost

OK, I see, you may want signed output from an unsigned operation like subtraction. And it's messy-looking to have an intermediate value (before casting it) which plainly shows the result of an overflow.

I think C scalars have the same problem. If you do (x + y) > z, then the signedness of x and y (and z) inform the signedness of the comparison. Although C has very terse syntax for doing the casts, this is still an issue.

So there are two ways to solve it: Make separate signed and unsigned operations, or expect programmers to do explicit (or C-like implicit) casts. I think most high-level languages go with the latter one, but if this comes up a lot, it could be an argument for the former. I don't know what SIMD code tends to look like. Does anyone have an example code snippet where this came up (in any language)?

Aug 04 '15 00:08 littledan

I think adding separate signed and unsigned subtract might be compounding the problem of exploding the number of operations required, which was, originally, an argument against separate types. Add will suffer the same fate (eg., adding a negative bias to an unsigned value), and others can too, but I don't want to fuel those flames. Would we really want to gain unsignedSubtract() as a consequence of changes which were expected to do away with unsignedLessThan()?

I just wanted to highlight that the implications of separating signed and unsigned types comes with a lot of expressiveness expectations which are actually hard to live up to. I think the troubles around C's unsigned promotions and comparisons also highlight what can happen if you promise too much in your type system.

Initially I thought I was ambivalent, but now I think it might genuinely be too hard to separate signed and unsigned types in a coherent and useful way. Separating them doesn't seem to add much benefit and it makes other cases misleading or unclear or ugly.

Aug 04 '15 01:08 ghost

I don't think promotions are on the table.

What do you mean by unsignedSubtract()? What is the "negative bias to an unsigned value" that you're talking about? I didn't know there were more function names that we would have to add to support split signed/unsigned types.

Aug 04 '15 17:08 littledan

ecmascript_simd ecmascript_simd copied to clipboard

Again, reconsider unsigned types (TC39 request)

ecmascript_simd
ecmascript_simd copied to clipboard