test262 Varied `TypedArray` helper

As we discussed last week, I drafted out what adding a test helper for the different sorts of backing arrays might look like. While I originally expected to be able to add the variations loop to the testWithTypedArrayConstructors itself and not need to intervene in the callbacks, I realized that would not work because we of course expect varied behavior, so the callback needs to access each. It could still be an argument to the callback, but that's a little bit of a silly round trip.

Anyways, here's a starting point!

Apr 19 '22 23:04 sarahghp

@jugglinmike this did lead me to wonder even more what this would look like as instructions to interpreters, since the behavior would vary based on the array setup.

Apr 19 '22 23:04 sarahghp

I was also hoping to avoid iteration in the callbacks because of the verbosity that you've highlighted. If we want distinctive failure messages, then it will be even more noisy than your sample tests suggests, e.g.

 [
-  fixedLengthWithOffset,
+  [fixedLengthWithOffset, 'fixed length with offset'],
-  lengthTrackingWithOffset
+  [lengthTrackingWithOffset, 'tracking length with offset']
-].forEach((a) => {
+].forEach(([array, desc]) => {
-  assert.sameValue(a.at(0), 3, 'a.at(2) must return 3')
+  assert.sameValue(a.at(0), 3, 'a.at(2) must return 3, ' + desc)
-  assert.sameValue(a.at(1), 4, 'a.at(3) must return 4')
+  assert.sameValue(a.at(1), 4, 'a.at(3) must return 4, ' + desc)
 })

Granted: distinctive error messages are a nicety that implementers can technically do without because the tests will fail regardless. It's just that folks will have to dig deeper to understand why they failed, and audiences without access to the implementation under test (e.g. visitors to https://test262.report) will be left in the dark.

If the "harness" code can visit each of these variants, then it can also take care of error differentiation (similar to what testWithTypedArrayConstructors does today).

In light of that, I hope you don't mind if I brainstorm a bit in this direction!

What if we gave each TypedArray its own buffer and observed the offset when setting the buffer values? If every Typed Array instance described a similar sequence, then maybe the tests that we have in mind could consider them to be equivalent.

function testWithTypedArrayVariants(TA, values, callback) {
  function rab() {
    return new ArrayBuffer(4 * TA.BYTES_PER_ELEMENT, { maxByteLength: 8 * TA.BYTES_PER_ELEMENT });
  }
  const cases = [
    ['non-resizable', new TA(values)],
    ['fixed length', new TA(rab(), 0, values.length)],
    ['length tracking', new TA(rab(), 0)],
    ['fixed length with offset', new TA(rab(), 2 * TA.BYTES_PER_ELEMENT, (values.length / 2))],
    ['length tracking with offset', new TA(rab(), 2 * TA.BYTES_PER_ELEMENT)]
  ];

  for (let i = 0; i < 5; i += 1) {
    try {
      // Writes data to the buffer backing the array
      for (let j = 0; j < values.length; j += 1) {
        cases[i][1][j] = values[j];
      }

      callback(cases[i][1]);
    } catch (error) {
      error.message += ' (' + cases[i][0] + ')';
      throw error;
    }
  }
}

That code's a bit rough, but do you think it has potential?

Apr 29 '22 23:04 jugglinmike

@jugglinmike — Thanks for your thoughts! I wanted to think it all through as deliberately as I could, since collaborating with folks you don't know well can lead to misundertandings, so please forgive the length of the response / stupid questions.

I was also hoping to avoid iteration in the callbacks because of the verbosity that you've highlighted. If we want distinctive failure messages, then it will be even more noisy than your sample tests suggests, e.g.
 [
-  fixedLengthWithOffset,
+  [fixedLengthWithOffset, 'fixed length with offset'],
-  lengthTrackingWithOffset
+  [lengthTrackingWithOffset, 'tracking length with offset']
-].forEach((a) => {
+].forEach(([array, desc]) => {
-  assert.sameValue(a.at(0), 3, 'a.at(2) must return 3')
+  assert.sameValue(a.at(0), 3, 'a.at(2) must return 3, ' + desc)
-  assert.sameValue(a.at(1), 4, 'a.at(3) must return 4')
+  assert.sameValue(a.at(1), 4, 'a.at(3) must return 4, ' + desc)
 })
Granted: distinctive error messages are a nicety that implementers can technically do without because the tests will fail regardless. It's just that folks will have to dig deeper to understand why they failed, and audiences without access to the implementation under test (e.g. visitors to https://test262.report) will be left in the dark.

You are 100% correct that distinct error messages are desirable; no convincing required here.

That said, what you call "noisy," I would perhaps call "verbose," and aesthetically, I am not opposed to verbose code, especially when that verbosity is communicative. (Though you could use template literals instead of + 'descString' above, which is perhaps a bit more expressive about what's going on.) Anyways! I am personally much more worried when I see code that includes cases[i][1][j] = values[j];, becuase I tend to find it simpler to understand code that doesn't expect me as a reader to pretend to be a compiler. 😜

If the "harness" code can visit each of these variants, then it can also take care of error differentiation (similar to what testWithTypedArrayConstructors does today).

I am not totally certain I understand what you mean by "harness" code here — I may be getting mixed up with the proposal to change instructions to runners. testWithTypedArrayConstructors is called by code in this repository — are you pointing out that by throwing the error in the helper, the callback does not need to create its own error messages?

If so, then this would be an accpetable recapitulation in my style? ⤵️

function testWithTypedArrayVariants(TA, values, callback) {

  // If we do this as assignment and not as a function ...
  const rab = new ArrayBuffer(4 * TA.BYTES_PER_ELEMENT, { maxByteLength: 8 * TA.BYTES_PER_ELEMENT });

  const cases = [
    ['non-resizable', new TA(values)],
    ['fixed length', new TA(rab, 0, values.length)],
    ['length tracking', new TA(rab, 0)],
    ['fixed length with offset', new TA(rab, 2 * TA.BYTES_PER_ELEMENT, (values.length / 2))],
    ['length tracking with offset', new TA(rab, 2 * TA.BYTES_PER_ELEMENT)]
  ];

  // ... then we only have to do this once 
  let ta_write = new TA(rab);
  for (let i = 0; i < values.length; ++i) {
    ta_write[i] = values[i];
  }
  
  cases.forEach(([name, case]) => {
    try {
      callback(case)
    } catch (error) {
      error.message += ' (' + name + ')';
      throw error;
    }
  })
}

Assuming yes, there are a few things I notice / wonder.

The first is in the comments above — your version is creating a separate backing ArrayBuffer each time and then filling per iteration. My version (based on @marjakh's v8 test code) creates a shared backing buffer and fills it once. Is there a reason we'd not want to do that? I don't know all the corners of the various test runners or array buffers, so perhaps I am missing a distinction here?
Somewhat more concerningly, this approach doesn't seem to address the biggest issue that drove my suggestion — that we expect the contents of the callback to differ by case. How would the example spec (test/built-ins/TypedArray/prototype/at/returns-item.js) be updated to use testWithTypedArrayVariants in the differing cases?

This is where I get stuck:

testWithTypedArrayVariants(TA => {
  assert.sameValue(typeof TA.prototype.at, 'function', 'The value of `typeof TA.prototype.at` is "function"');

  const values = [1, 2, 3, 4];

  /* 
    Works for non-offset ...
  */

  testWithTypedArrayVariants(TA, values, (case) => {
    /*
      I updated this to refer to the values array instead of the 
      explicit value because really they should have both been 
      written this way — it makes the connection clear.
    */
    assert.sameValue(case.at(0), values[0], 'a.at(0) must return 1');
    assert.sameValue(case.at(1), values[1], 'a.at(1) must return 2');
    assert.sameValue(case.at(2), values[2], 'a.at(2) must return 3');
    assert.sameValue(case.at(3), values[3], 'a.at(3) must return 4');
  });

  /* 
    ... and then what  do we do for the offsets?
  */
});

It seems like part of the answer is in here, but I am afraid I don't see it.

What if we gave each TypedArray its own buffer and observed the offset when setting the buffer values? If every Typed Array instance described a similar sequence, then maybe the tests that we have in mind could consider them to be equivalent.

I do have fairly grave concerns when I see suggestions about expecting the same sequence as part of a solution, because I find that to be an indication of implicit expectations that can trip people up and make writing tests for difficult because the expectations are brittle.

It also seems like a lot of roundabout trips — callbacks inside callbacks, generating the ArrayBuffer multiple times, observing the generator, relying on order — rather than the directness of a utility generator, which is values in, variations out, with the callback owning all the varied logic and error messages. The latter is definitely more explicit and allows writers a lot of flexibility.

So overall, I think I am in favor of the previous version of the helper. We could possibly provide default labels or generate them, but I honestly think it's a real expressiveness win to keep them in the test file itself.


function createTypedArrayVariations(TA, values) {
  const rab = new ArrayBuffer(4 * TA.BYTES_PER_ELEMENT, { maxByteLength: 8 * TA.BYTES_PER_ELEMENT });
  // ..
  let fixedLengthWithOffset = {
    name: 'fixed length with offset',
    contents: new TA(rab, 2 * TA.BYTES_PER_ELEMENT, (values.length / 2))
  };

  let lengthTrackingWithOffset = {
    name: 'length-tracking with offset',
    contents: new TA(rab, 2 * TA.BYTES_PER_ELEMENT)
  };
  
// ..

  return {
    // ..
    fixedLengthWithOffset,
    lengthTrackingWithOffset,
  }

  // in TA callback
  [
    fixedLengthWithOffset,
    lengthTrackingWithOffset
  ].forEach(( { contents, name } ) => {
    assert.sameValue(contents.at(0), 3, `contents(2) must return 3 in ${name}`)
    assert.sameValue(contents.at(1), 4, `contents(3) must return 4 in ${name}`)
  })

Finally one last question: neither of these addresses the alternate proposal to change the instructions to interpreters. Is that because we agree a helper change is a better approach, or do you still plan to do an example of that?

May 09 '22 15:05 sarahghp

@jugglinmike — Thanks for your thoughts! I wanted to think it all through as deliberately as I could, since collaborating with folks you don't know well can lead to misundertandings, so please forgive the length of the response / stupid questions.

Thank you! I very much appreciate the care you're taking with the design process (which necessarily involves asynchronous communication). And until the day that it goes without saying around here: there are no stupid questions :)

That said, what you call "noisy," I would perhaps call "verbose," and aesthetically, I am not opposed to verbose code, especially when that verbosity is communicative.

"Verbose" is more neutral, and I agree with your feelings about verbosity. "Noisy" is like "unhelpfully verbose," so maybe we should discuss whether or not the extra code is helpful.

My feeling is that string concatenation is not helpful. In this code:

assert.sameValue(a.at(0), 3, 'a.at(2) must return 3, ' + desc)

I don't think the + desc helps readers understand the intent of the test.

Separate from subjective appeals to utility, I expect that some contributors will disagree about the importance of this point (or may simply not recognize the need), and as maintainers, we will find ourselves either explicitly encouraging people to extend assertion messages or doing it ourselves in follow-up patches. If we can find a way to build that into the harness, then nobody will need to think about it again.

I am personally much more worried when I see code that includes cases[i][1][j] = values[j];, becuase I tend to find it simpler to understand code that doesn't expect me as a reader to pretend to be a compiler. stuck_out_tongue_winking_eye

Don't worry! We'll find a solution everyone's happy with :) That (ab)use of arrays is one of the things I had in mind when I called my strawperson "rough." In your examples building off of this one, you used expressive property names rather than arbitrary array indices, and that seems like a good idea to me.

I am not totally certain I understand what you mean by "harness" code here — I may be getting mixed up with the proposal to change instructions to runners. testWithTypedArrayConstructors is called by code in this repository — are you pointing out that by throwing the error in the helper, the callback does not need to create its own error messages?

By "'harness' code," I mean code in the harness/ directory. Sorry for the confusion; terms like "test harness" and "test runner" can be used in different ways, and we only recently tried to make the distinction more clear in Test262's documentation.

Anyway, yes: if the harness function catches the error and appends the distinguishing information, it relieves every individual test author of that responsibility.

The first is in the comments above — your version is creating a separate backing ArrayBuffer each time and then filling per iteration. My version (based on @marjakh's v8 test code) creates a shared backing buffer and fills it once. Is there a reason we'd not want to do that? I don't know all the corners of the various test runners or array buffers, so perhaps I am missing a distinction here?

Two reasons. First, I'm expecting that we'll be using this utility to test semantics which modify the data in the buffer. If that's right, then we'll need each array to have its own buffer so that they don't interfere with each other.

Second, I think this would let us use "non-offset" and "offset" TypedArrays interchangeably. Specifically: the new harness function could produce "offset" TypedArrays whose backing buffers were populated such that from the perspective of the array consumer, they appeared to describe the same set of data as the "non-offset" TypedArrays.

Maybe an ASCII diagram would help:

buffer #1               [ 1, 2, 3, 4 ]
"non-offset" TypedArray [            ]

buffer #2               [ 0, 0, 1, 2, 3, 4, 0, 0 ]
"offset" TypedArray            [           ]

With buffers set up like that, nonOffsetTypedArray.at(2) would return the same value as offsetTypedArray.at(2) (for example), so tests could use the same set of assertions for each array.

I do have fairly grave concerns when I see suggestions about expecting the same sequence as part of a solution, because I find that to be an indication of implicit expectations that can trip people up and make writing tests for difficult because the expectations are brittle.

I think we are on the same page, there. If the harness function hard-coded specific values (e.g. by setting [1, 2, 3, 4] for every test which used it), then I would share your concern.

That said, with the phrase "a similar sequence" I was (somewhat sloppily) trying to describe the relationship in the example above--that while the two buffers have different contents, their arrays present the same value for each element.

Finally one last question: neither of these addresses the alternate proposal to change the instructions to interpreters. Is that because we agree a helper change is a better approach, or do you still plan to do an example of that?

I'm not convinced that introducing a harness function is strictly better, but we stand to learn a lot about the relevant challenges by experimenting with either approach. I'm happy to run with the harness function.

May 14 '22 01:05 jugglinmike

Drive-by chiming in with my 2c.

First, I'm expecting that we'll be using this utility to test semantics which modify the data in the buffer. If that's right, then we'll need each array to have its own buffer so that they don't interfere with each other.

~~It remains valuable to test different kinds of TAs that are backed by the same resizable buffer IMO. All engines I know of will probably implement this proposal in a "pull" fashion, i.e. each TA will consult the backing buffer for length, etc each time that info is needed. But it's not inconceivable to have "push" implementations, whether partial or in whole, where backing buffers update their TA views' lengths (or cached lengths or something) when it gets resized. If all tests had a 1:1 TA:buffer relationship, we'd miss bugs of the latter implementation strategy.~~

I retract this suggestion -- the "push" implementations are hypothetical right now and time is better spent on landing tests for implementation techniques that we know are going to be in use, so let's not expand the scope of the helpers.

Second, I think this would let us use "non-offset" and "offset" TypedArrays interchangeably.

I might be missing something, why would we want to treat these interchangeably? There're extra tests you'd want to run on the offset TAs since it has offset computation builtin.

May 19 '22 23:05 syg

@syg

I might be missing something, why would we want to treat these interchangeably?

We would want to do so for one version of the helper design — if we go with what I will call the "Mike option" and make it so that it runs the same callback function regardless of kind (offset/non-offset), then we need to treat them as interchangeable to the extent that the expected behavior, such as the value at a certain index, is the same.

However, I am not certain this will be possible for the majority of cases — consider .slice, where we might want to observe some cases throwing, for instance where offset arrays can go out of bounds, but non-offset do not. Given this, I think we might be better off abandonning it as a design goal. The trade-off for this is that more needs to be in the test files themselves and test-writers might then need to be more careful to write themselves helpful test descriptions. (This would be the "Sarah option" — a utility generator vs. a higher-order callback.)

@jugglinmike, what do you think about this as a trade-off? I agree that it requires some effort from maintainers and test-writers, but luckily, I will be writing a lot of the initial tests and then the example of using the name from the utility generator will be in the code base (& this is just one corner).

I also know you say:

My feeling is that string concatenation is not helpful. In this code:

assert.sameValue(a.at(0), 3, 'a.at(2) must return 3, ' + desc)

I don't think the + desc helps readers understand the intent of the test.

but I do think the example I give, with the named variables, does make the inention clear:

assert.sameValue(contents.at(0), 3, contents(2) must return 3 in ${name})

More generally, I would suggest that giving test writers myriad small building blocks allows them to create meaning that matches the test intention with as little hoop jumping as possible, wheras making too many assumptions may not even save work in the end, since cases need to be formed to match it.

Other than the callback vs. utility approach, it seems the other outstanding question is about shared vs. separate backing buffers, and it appears that there is no universally applicable rule, so we could approach it a few ways: two helper functions; an argument indicating whether the buffer out to be shared; or adding an argument for the buffer to which test writers could pass a function returning a shared buffer or new instance. The latter is probably too fiddly, although it would let users do any crazy thing they wanted, so there's that.

May 30 '22 14:05 sarahghp

However, I am not certain this will be possible for the majority of cases — consider .slice, where we might want to observe some cases throwing, for instance where offset arrays can go out of bounds, but non-offset do not.

Yep, that matches my intuition. The error conditions definitely differ.

May 31 '22 16:05 syg

Other than the callback vs. utility approach, it seems the other outstanding question is about shared vs. separate backing buffers, and it appears that there is no universally applicable rule, so we could approach it a few ways: two helper functions; an argument indicating whether the buffer out to be shared; or adding an argument for the buffer to which test writers could pass a function returning a shared buffer or new instance. The latter is probably too fiddly, although it would let users do any crazy thing they wanted, so there's that.

I retracted my suggestion above. I thought it'd be helpful for a certain kind of implementation technique that AFAIK no engine uses right now. So let's not add another dimension to the combinations of these tests for now. There's no requirement that the initial set of tests ought to be exhaustive.

May 31 '22 16:05 syg

test262 test262 copied to clipboard

Varied `TypedArray` helper

test262
test262 copied to clipboard