sh
sh copied to clipboard
expand: Variable cannot represent indexed arrays with gaps
A Bash arrays can start at any index, though it counts the number of set elements as part of the length.
For example, Bash says:
$ x=([1]=a); echo ${#x[@]}
1
gosh says:
$ x=([1]=a); echo ${#x[@]}
2
expand.Variable.List
being a []string
is not flexible enough to describe these bespoke arrays. A zero string is a valid expansion after all.
Yikes, I wasn't aware that Bash indexed arrays are... not actually arrays. Here's another example:
$ empty=("" "foo") missing=([1]="bar")
$ echo "${empty[@]}" "${#empty[@]}"
foo 2
$ echo "${missing[@]}" "${#missing[@]}"
bar 1
You can even have gaps anywhere in the array, not just the beginning:
$ middle=([1]="one" [3]="three")
$ echo "${middle[@]}" "${#middle[@]}"
one three 2
The question then becomes - how do we represent indexed arrays in an efficient way? We could give up entirely and use a map[int]string
or the existing map[string]string
directly, but as you can guess, maps have an inherent overhead when compared to a plain old slice.
Maybe the answer is just that indexed arrays in Bash are practically as costly as associative arrays, since both seem to behave like maps.
I'm trying to think if we could possibly fix this in v3 without breaking programs. I don't think so, because the Variable.List field is exposed directly as a field instead of a method.
There's one hack we could possibly do in v3. Fill the "gaps" with string values which are impossible to have in shell, such as a single null/zero byte:
$ a=$'\x00'
$ echo "${a} ${#a}"
0
Then, the length of an indexed array means looping over List to count the non-null elements. Similarly, expansion would have to skip over such gaps.
This is a hack, but I think it should be backwards compatible - any direct users of expand.Variable
would have their behavior changed for arrays with gaps, but those didn't work properly to begin with. So no breakages should happen in practice.
I also wonder if this hack is actually better in the long run than alternatives like map[int]string
or []struct{idx int; val string}
. Assuming that gaps are rare, and that one would rarely leave immense gaps like arr=([10000]=foo)
, then it makes sense to keep a contiguous slice representation.
C strings strike again. Null as a sentinel value makes sense.
The first data structure that came to mind is a skip list. It would take less memory but lookup and insertion is log n. Perhaps v4 can hide this detail where gapless arrays are a straightforward slice and sparse arrays are a skip list/something else.
With the assumption that gaps will be extremely rare, or at least not huge, I would rather avoid implementing a skip list entirely. Having two separate implementations would also add code, which I'd like to avoid.
With the assumption above, I think the "null" strings for the gaps should work okay. It could possibly cause memory issues for e.g. arr=([10000]=foo)
, but I doubt that actually happens in practice.
I also think that optimizing the storage of variables should be taken with a pinch of salt. Variable values in Bash are strings anyway, so there's always going to be an inherent overhead with variables and arrays.
Totally fair. No real complaints from me with only a null sentinel implementation. :)