ecma262 icon indicating copy to clipboard operation
ecma262 copied to clipboard

Unify String and Array maximum lengths

Open msaboff opened this issue 9 years ago • 17 comments

Currently Strings are defined to consist of up to 253 - 1 elements (See 6.1.4 The String Type). Arrays on the other hand can have up to 232 - 1 indexed elements (See 6.1.7 The Object Type and the definition of array index). Given that there are functions that convert readily between Arrays and Strings, it makes sense that their limits be the same. Another way of looking at this is that a String is an array of elements.

msaboff avatar Jul 21 '16 21:07 msaboff

Firefox appears to have a max length of 2 ** 28 - 1, Safari 2**31 - 1, Chrome around 2**27 - not sure about Edge or IE but this doesn't seem like it will conflict with any implementations.

ljharb avatar Jul 21 '16 22:07 ljharb

This should be an agenda item, as I'm sure implementers will want to discuss

rwaldron avatar Jul 21 '16 22:07 rwaldron

Particularly considering the change was requested by an implementer: https://github.com/rwaldron/tc39-notes/blob/master/es7/2015-07/july-28.md#612-stringprototypesplit-its-limit-argument-and-tolength-vs-touint32

rwaldron avatar Jul 21 '16 22:07 rwaldron

Just added it to the agenda.

msaboff avatar Jul 21 '16 22:07 msaboff

During development of ES6 a decision was made to extend the specified max length of Strings and Typed Arrays 253-1. The rationale was the 4 gig strings and arrays were likely to become too limiting in a world where address spaces could grow to be thousands of gigabytes. We would have also extended the max length of Array except for the strange legacy 232-1 wrapping behavior of Array property indexing. The fear was that extending the max size of Array (and removing the wrapping) might break some existing code. However, we did decide that most of the Array.prototype methods that are written to operate on generic integer indexed collections (not just Array instances) could safely be rewritten to not have dependencies upon the 232-1 limit or index wrapping. All of this and the trade-offs involved was extensively discussed at TC39 meetings and should be in meeting notes (most likely 2011-2014 timeframe).

We really shouldn't want to revert the max size of strings or Typed Arrays/ArrayBuffer. Someday that is going to be a problem and the best way to fix it is to allow for it now, before implementations are running into it as an issue.

The real challenge is how to make it possible to extend Array beyond that limit. From that perspective I think we came to the wrong conclusion at the July 2015 meeting WRT split. Remember the only reason that the max length of Array wasn't increased was because of web breakage concerns. But note that part of the rationale TC39 used to justify removing the 232-1 limits within most Array prototype functions was that as of that time, no implementation support Arrays (or strings) of that size, so there could be no legacy usage of those functions that depended upon that size limit.

Generally, there is no need to legacy protect code that didn't exist prior to the spec. edition that first makes the change. In the case, of split, there could be no valid legacy code that is dependent upon splitting a string whose length is > 232-1. So how should split be specified to behave in a future where some implementations allow such a limit? The simple and immediate fix (rather than reverting to using ToUint32 on the length) would be to simply throw if split needs to create an array whose length is greater than 232-1. In future code, if it isn't possible to create an Array instance of the necessary size, throwing is the appropriate behavior rather than creating a too short Array that doesn't match the expected results of split.

But I think, there is possibly a better solution. Remember, that the only reason we didn't extend the length limit for Array was the fear that removing the wrapping behavior of array indexing would break some legacy code. Maybe there is a way around that legacy trap. What if we allow for two variants of Array instances. Instances that have the 232-1 length limit and which warp indexing and instances that have a 253-1 length limit and don't warp indexing. (Note that both would be considered instances of the Array constructor, share the same prototype etc. The different semantics would be imposed at the MOP level rather than at the class level).

Let's distinguish the two kinds of Array instances, let's call them "legacy arrays" and "huge arrays". To maintain legacy compatibility with new Array(aValueGreaterThan2raisedTo32), new Array would have to continue to create legacy array instances. There would have to be some new mechanism for creating huge arrays instance. As a strawman, let's assume that could be done by something like:

   new Array.huge(len)  //len may be aValueGreaterThan2raisedTo32

It should also be possible to say things like:

Array.huge.from(anotherPossibleHugeCollection)

New ES code that wants to accommodate very large arrays (or use smaller arrays the don't wrap large indices) would use Array.huge based construction. split and any other built-ins that need to create possibly huge arrays could then be specified to do the equivalent of new Array.huge when they encounter a length value > 232-1.

allenwb avatar Jul 22 '16 19:07 allenwb

Is there actually any data on code intentionally relying on wrapping of indexes? Or is this just a theoretical concern?

On Jul 22, 2016 12:45 PM, "Allen Wirfs-Brock" [email protected] wrote:

During development of ES6 a decision was made to extend the specified max length of Strings and Typed Arrays 253-1. The rationale was the 4 gig strings and arrays were likely to become too limiting in a world where address spaces could grow to be thousands of gigabytes. We would have also extended the max length of Array except for the strange legacy 232-1 wrapping behavior of Array property indexing. The fear was that extending the max size of Array (and removing the wrapping) might break some existing code. However, we did decide that most of the Array.prototype methods that are written to operate on generic integer indexed collections (not just Array instances) could safely be rewritten to not have dependencies upon the 232-1 limit or index wrapping. All of this and the trade-offs involved was extensively discussed at TC39 meetings and should be in meeting notes (most likely 2011-2014 timeframe).

We really shouldn't want to revert the max size of strings or Typed Arrays/ArrayBuffer. Someday that is going to be a problem and the best way to fix it is to allow for it now, before implementations are running into it as an issue.

The real challenge is how to make it possible to extend Array beyond that limit. From that perspective I think we came to the wrong conclusion at the July 2015 meeting WRT split https://github.com/rwaldron/tc39-notes/blob/master/es7/2015-07/july-28.md#conclusionresolution-11. Remember the only reason that the max length of Array wasn't increased was because of web breakage concerns. But note that part of the rationale TC39 used to justify removing the 232-1 limits within most Array prototype functions was that as of that time, no implementation support Arrays (or strings) of that size, so there could be no legacy usage of those functions that depended upon that size limit.

Generally, there is no need to legacy protect code that didn't exist prior to the spec. edition that first makes the change. In the case, of split, there could be no valid legacy code that is dependent upon splitting a string whose length is > 232-1. So how should split be specified to behave in a future where some implementations allow such a limit? The simple and immediate fix (rather than reverting to using ToUint32 on the length) would be to simply throw if split needs to create an array whose length is greater than 232-1. In future code, if it isn't possible to create an Array instance of the necessary size, throwing is the appropriate behavior rather than creating a too short Array that doesn't match the expected results of split.

But I think, there is possibly a better solution. Remember, that the only reason we didn't extend the length limit for Array was the fear that removing the wrapping behavior of array indexing would break some legacy code. Maybe there is a way around that legacy trap. What if we allow for two variants of Array instances. Instances that have the 232-1 length limit and which warp indexing and instances that have a 253-1 length limit and don't warp indexing. (Note that both would be considered instances of the Array constructor, share the same prototype etc. The different semantics would be imposed at the MOP level rather than at the class level).

Let's distinguish the two kinds of Array instances, let's call them "legacy arrays" and "huge arrays". To maintain legacy compatibility with new Array(aValueGreaterThan2raisedTo32), new Array would have to continue to create legacy array instances. There would have to be some new mechanism for creating huge arrays instance. As a strawman, let's assume that could be done by something like:

new Array.huge(len) //len may be aValueGreaterThan2raisedTo32

It should also be possible to say things like:

Array.huge.from(anotherPossibleHugeCollection)

New ES code that wants to accommodate very large arrays (or use smaller arrays the don't wrap large indices) would use Array.huge based construction. split and any other built-ins that need to create possibly huge arrays could then be specified to do the equivalent of new Array.huge when they encounter a length value > 232-1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tc39/ecma262/pull/641#issuecomment-234638066, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMDKpbH_GiFyS9fOdhV2eME3cFfiV4rks5qYR3BgaJpZM4JSQZF .

concavelenz avatar Jul 23 '16 19:07 concavelenz

Is there actually any data on code intentionally relying on wrapping

This was discussed at past TC39 meetings, but I don't recall whether any actual evidence was presented. Need to search past TC39 meeting notes.

allenwb avatar Jul 23 '16 20:07 allenwb

I’ve found the following old bug about removing the uint32 length restriction on arrays. But it was WONTFIXed without other explanation than "we didn't do that":

https://bugs.ecmascript.org/show_bug.cgi?id=145

claudepache avatar Jul 25 '16 12:07 claudepache

I've tested various ways of attempting to construct an array of out-of-bound length. The results are in the table below. The most interesting cases are in the last two columns, namely the behaviour of the .concat() method.

In ECMA262, the RangeError is due to the final step in the respective algorithms, that attempts to set an illegal value to the length of the array. (That final step was accidentally removed in .concat() in ES5, see ecmascript:bug#129, but it was present in ES3 and again in ES6). Without that final step, the length of the array would be stuck to at most 232-1 if I read correctly the spec.

Relevant nontrivial section in the spec: the [[DefineOwnProperty]] internal method of Array exotic objects

(NB: 0xffffffff = 232-1 = 4294967295.)

[].length = -1 [].length = 0x123456789 Array(0xffffffff)
.push(42)
Array(0xffffffff)
.unshift(42)
Array(0xffffffff)
.splice(0xfffffffd, 0, 42)
Array(0xfffffffe)
.concat([42,43])
.length
Array(0xfffffffe)
.concat(Array(2))
.length
ECMA262 RangeError RangeError RangeError RangeError RangeError RangeError RangeError
Firefox RangeError RangeError RangeError (stop responding) RangeError 0 0
Chrome RangeError RangeError RangeError RangeError RangeError RangeError 0xffffffff
Safari RangeError RangeError RangeError Error("Out of memory") Error("Out of memory") Error("Out of memory") Error("Out of memory")
Edge RangeError RangeError RangeError RangeError RangeError 0xffffffff 0xffffffff

So, I think it is feasible to extend array's max length to 253-1.

claudepache avatar Jul 25 '16 15:07 claudepache

Repeating the most interesting columns of the preceding table for easier reading:

Array(0xfffffffe)
.concat([42,43])
.length
Array(0xfffffffe)
.concat(Array(2))
.length
ECMA262 RangeError RangeError
Firefox 0 0
Chrome RangeError 0xffffffff
Safari Error("Out of memory") Error("Out of memory")
Edge 0xffffffff 0xffffffff

claudepache avatar Jul 25 '16 16:07 claudepache

On Jul 25, 2016, at 8:53 AM, Claude Pache [email protected] wrote:

I've tested various ways of attempting to construct an array of out-of-bound length. The results are in the table below. The most interesting cases are in the last two columns, namely the behaviour of the .concat() method.

...

So, I think it is feasible to extend array's max length to 253-1.

The legacy concern wasn’t about the cases where such huge arrays might be created. The concern was about what happens when a value greater than 232-2 is used as an array index. In particular, consider:

var a = new Array(0); console.log(length: ${a.length} keys: ${Object.keys(a)}); //length: 0 keys:

a[Math.pow(2,32)-2]=“x”; console.log(length: ${a.length} keys: ${Object.keys(a)}); //length: 4294967295 keys: 4294967294 //note length auto updated

a[Math.pow(2,32)]=“x”; console.log(length: ${a.length} keys: ${Object.keys(a)}); //length: 4294967295 keys: 4294967294,4294967296 //note length not updated, property beyond length added.

This can be happening today. Would changing this behavior (such that array lengths auto updated beyond length 232-1 break anything/ Nobody knows.

Allen

allenwb avatar Jul 25 '16 17:07 allenwb

Would changing this behavior (such that array lengths auto updated beyond length 232-1 break anything/ Nobody knows.

So, what is the risk concretely? I imagine that some rare broken script could get even more broken...

But sure, the only way to know is to try. It would be a shame that nobody would want to try.

claudepache avatar Jul 25 '16 17:07 claudepache

Array wrapping sounds useful for tricking a nodejs/iojs server into writing to fields that do not expect user-provided data. Not sure though whether a higher or lower wrap index is more useful, probably depends on how that server reads or calculates the target index number.

mk-pmb avatar Aug 08 '16 16:08 mk-pmb

@msaboff should I leave this open or has this proposal been withdrawn since last TC39?

bterlson avatar Aug 09 '16 20:08 bterlson

@bterlson I still think it is an open issue.

msaboff avatar Aug 09 '16 20:08 msaboff

@msaboff What are the next steps for this proposal?

littledan avatar Mar 05 '18 10:03 littledan

No example of real code has been provided over years. So, imho, all concerns against increasing max size in this thread are purely hypothetical fears...

From the other hand - the longer this change awaits - the higher is the chance for such legacy code to be written, and something important actually depending on this old limitation.

c69 avatar Mar 15 '18 21:03 c69