proposal-iterator-helpers icon indicating copy to clipboard operation
proposal-iterator-helpers copied to clipboard

passing index to various iterator helper callbacks

Open michaelficarra opened this issue 2 years ago • 14 comments

In plenary, some argued for passing a second index parameter to the Array.prototype-like methods' callbacks. I'm not entirely opposed to this but I'm unconvinced at the moment that this is an improvement. Also, I am really uninterested in adding separate "indexed" versions of each helper, as was mentioned. Here's how I see the pros/cons of each approach.

Motivation for not passing an index

  • index is not useful for lookaround indexing pattern
    • besides, we will have better patterns for this that don't involve indexing, such as zipping
  • indices are relative to in-progress iteration, not some indexable structure
    • it's easy to change the base of these indices when introducing a drop/filter/etc earlier in the chain
  • avoids the common class of error that includes passing parseInt to Array.prototype.map
  • if we don't continue passing index for non-Array.prototype methods (like zipWith, takeWhile, tap), we create a weird divide
  • matches for-of not providing a counter
  • implementations don't need to keep a counter

Motivation for passing an index

  • familiarity from Array.prototype methods
    • though, no indexable third parameter passed, so easy to misuse for indexing an original indexable/iterable structure
  • convenient: don't need to add an indexed() to the chain
  • no intermediate tuples to create and GC
  • 1 less method on Iterator.prototype

/cc participants: @js-choi @waldemarhorwat @ljharb @bakkot @erights @rbuckton @littledan

michaelficarra avatar Jul 22 '22 20:07 michaelficarra

The motivations for passing an index seem stronger to me, when laid out like this. There is almost no downside, and avoiding a bunch of extra intermediate objects is a pretty significant benefit.

bakkot avatar Jul 22 '22 21:07 bakkot

To me this layout reinforces to me that there shouldn’t be one. Avoiding object allocations is nice, but indexes on an iterator is a category error.

ljharb avatar Jul 22 '22 21:07 ljharb

Let's look at the current methods that work with iterators / non-indexed collections, for example Array.from or Set.prototype.forEach - they don't pass indexes to callbacks. I'm against inconsistency.

Also, unlike indexed collections, the number of iterations is not limited by MAX_SAFE_INTEGER - what should be done after that?

zloirock avatar Jul 22 '22 21:07 zloirock

indexes on an iterator is a category error.

No? There is a first thing you get out of the iterator, a second thing you get out of the iterator, etc.

Also, unlike indexed collections, the number of iterations is not limited by MAX_SAFE_INTEGER - what should be done after that?

I feel confident in predicting this will genuinely never happen.

bakkot avatar Jul 22 '22 21:07 bakkot

I feel confident in predicting this will genuinely never happen.

However, this case should be covered by the spec. Or infinite iterators should be forbidden?

zloirock avatar Jul 22 '22 22:07 zloirock

However, this case should be covered by the spec

Sure. The behavior will be that the spec counts by math integers and then casts to a double, so once you get to MAX_SAFE_INTEGER you'll start seeing indices repeat and skip.

This is the same as what happens for indexed in the spec already (or will once https://github.com/tc39/proposal-iterator-helpers/pull/208 lands, anyway, whoops).

bakkot avatar Jul 22 '22 22:07 bakkot

matches for-of not providing a counter (and other existing and future methods consuming iterators)

This is the strongest argument I think. It's also much cleaner to treat iterators as a pure stream of values, not indexed values.

implementations don't need to keep a counter

Or worse, they would need to keep multiple counters. If I did iterator.filter(…).map(…).drop(1).forEach(…), there would need to be 3 different counters. Honestly I hope that engines will at some point be able to do stream fusion for builtin iterators and optimise away the creation of intermediate tuples from indexed if they are always consumed by destructuring, so I don't think allocation/gc pressure should be considered in the API design.

bergus avatar Jul 22 '22 22:07 bergus

It's also much cleaner to treat iterators as a pure stream of values, not indexed values.

You can certainly choose to do that if you want; nothing is forcing you to consume the second parameter.

Or worse, they would need to keep multiple counters.

Counters are really not that expensive.

Honestly I hope that engines will at some point be able to do stream fusion for builtin iterators and optimise away the creation of intermediate tuples from indexed if they are always consumed by destructuring, so I don't think allocation/gc pressure should be considered in the API design.

I would certainly not want to rely on this without first hearing from implementations that this would be feasible (and I don't really expect it to be).

Even then, very sophisticated optimizations like this usually take a while to kick in, so the cost is still paid at startup - i.e., during page load or when starting up a script. It's not just long-run performance on a maximally sophisticated optimizing interpreter which matters.

bakkot avatar Jul 22 '22 22:07 bakkot

No? There is a first thing you get out of the iterator, a second thing you get out of the iterator, etc.

Yes, but that’s a counter. An index is a key that you can use to arbitrarily look up an item in a list by order.

ljharb avatar Jul 22 '22 23:07 ljharb

That is not the definition of "index" I use, but in any case, nothing user-exposed will say this is specifically an "index" rather than a "counter".

bakkot avatar Jul 22 '22 23:07 bakkot

I am mildly on the side of including ordinal/counter integers in map.

[@ljharb] To me this layout reinforces to me that there shouldn’t be one. Avoiding object allocations is nice, but indexes on an iterator is a category error.

To avoid going in semantic circles, I’d like to eschew (at least in this conversation) the use of the ambiguous word “index” in favor of unambiguous phrases. We need to distinguish between “integers indicating the ordinals of a lazy sequence” and “integers for random access into a randomly accessible data structure”.

After all, we are not talking about random-access indexes; we are not considering giving the original iterator to iterator.map. We are talking about integers indicating the ordinals of a lazy sequence, just like what .indexed would return.

I would certainly agree that “integers for random access into a randomly accessible data structure” is a category error. But does that also apply to “integers indicating the ordinals of a lazy sequence”?

If “integers indicating the ordinals of a lazy sequence” is a category error, then that category error seemingly applies just as much to the currently proposed .indexed method. .indexed also returns “integers indicating the ordinals of a lazy sequence”.

So is there any consistent and compelling reason why .indexed giving lazy-sequence ordinal numbers is not a category error, but .map giving lazy-sequence ordinal numbers is?


For what it’s worth, Clojure is careful to distinguish the two categories: it has has an nth function for retrieving values lazy sequences’ ordinals in O(n) time, and it has a get function for randomly accessible vectors in near-O(k) time. The separation between its sequence API’s nomenclature and its randoly accessible vector API’s nomenclature is quite elegant.

Relatedly, Clojure does not have an indexed function but rather has map-indexed and keep-indexed, because Rich Hickey wanted to avoid a temporary entry for every incoming value. There’s, of course, a parallel with iter.map and iter.filter maybe passing ordinal-number arguments in JavaScript.


[@bergus] It’s also much cleaner to treat iterators as a pure stream of values, not indexed values.

As @bakkot points out, the ordinal numbers can be easily ignored, but it’s a strong enough use case that almost every lazy-sequence library I know includes the ability to include ordinal numbers. After all, line numbers are still useful for asynchronous file streams, etc.

If this is about how “prominent” we are making the ordinal numbers, well, I would not call them prominent even for arr.map, which maybe >95% of the time gets mapping functions that do not use their extra integer arguments.

[@bergus] Honestly I hope that engines will at some point be able to do stream fusion for builtin iterators and optimise away the creation of intermediate tuples from indexed if they are always consumed by destructuring, so I don’t think allocation/gc pressure should be considered in the API design.

I appreciate this point of view, but, as long as we’re not worrying about the memory of intermediate indexed entries, then we also don’t have to worry about the memory of unused ordinal numbers, right? Either way, the engines could theoretically optimize them away. In fact, arguably, it would be easier for engines to avoid allocating a counter by checking the mapping function’s length.


[@bakkot] […] but in any case, nothing user-exposed will say this is specifically an “index” rather than a “counter”.

In fact, it could be argued that the method indexed is itself misleading, and that neither an indexed method nor map with ordinal numbers should be included in iterator-helpers. I would not really agree with this. But the idea that the word “index” with sequences always implies random access (as opposed to lazy-sequence ordinals)…is consistent with the idea that iter.indexed() is itself a category error.

Any complaint towards the word “index” for ordinals on lazy sequences should also be directed towards the naming of the proposed .indexed method. If “index” is misleading with lazy sequences, then .indexed should also be renamed, perhaps to .counters or .ordinals or something. The category-error concern doesn’t make sense otherwise.

And even then, we could always document the mapping function of iter.map as (value, counter) => { … } or (value, ordinal) => { … }, using the word “counter” or “ordinal” instead of “index”, and developers probably would not be confused.

js-choi avatar Jul 23 '22 00:07 js-choi

In fact, arguably, it would be easier for engines to avoid allocating a counter by checking the mapping function’s length.

It's also necessary to check that arguments isn't used, possibly through eval-code, i.e. function f() { return eval("arguments[0]"); }. Nested arrow-functions may also access arguments: function f() { return (() => eval("arguments[0]"))(); }. And non-strict functions may always access the arguments through the non-standard Function.prototype.caller and Function.prototype.arguments properties: function g() { return g.caller.arguments[0]; } function f() { return g(); }. :smile:

anba avatar Jul 27 '22 16:07 anba

Another argument against passing the index: JS has already established the pattern of distinguishing between iterators without indices and iterators of tuples with indices by Array.prototype.values vs Array.prototype.entries. Also I'm strongly against dropping .indexed(). In a loop like for (const [index, value] of arr.values().filter(…).indexed()), it's much easier to use than arr.values().filter(…).map((v, i) => [i, v]), and arr.entries().filter(…) has a different result.

bergus avatar Sep 26 '22 23:09 bergus

JS has already established the pattern of distinguishing between iterators without indices and iterators of tuples with indices by Array.prototype.values vs Array.prototype.entries

I don't understand how that's an argument against passing an index parameter to the callbacks for these helpers. Yes, iterators of tuples with indices already exist in some contexts, but so do callback functions to methods named .filter and .map and so on, and the latter are what we're discussing here.

bakkot avatar Sep 26 '22 23:09 bakkot