problem-solving
problem-solving copied to clipboard
No literal syntax for sparse arrays
There is no syntax for representing sparse arrays without constructing them procedurally. Because of this, .raku completely discards :exists/:!exists information, generating a contiguous array full of default values which do exist at the holes, instead of a sparse array. Obviously this means round-trippability of .raku/EVAL is currently broken for sparse arrays.
#raku log from Apr 24, 2021:
16:46:19 raydiak | m: use MONKEY-SEE-NO-EVAL; my @a; @a[1] = 1; say @a.raku; say @a[0]:exists; my @b := EVAL @a.raku; say @b.raku; say @b[0]:exists; 16:46:19 camelia | rakudo-moar ea102883d: OUTPUT: «[Any, 1]False[Any, 1]True» 16:54:24 raydiak | not only does .raku throw away sparse array information, I don't think we even have a syntax which could properly represent it as an expression without a do block. list assignment to a slice could work to construct it, but as an expression it would return the list of values on the right of the assignment instead of the sparse array, so you'd still have to wrap it in a do block so you can name the array and 16:54:24 raydiak | return it without exposing the temporary name to the surrounding context 18:40:48 lizmat | raydiak: [,1] could be such a thing I guess 18:41:35 lizmat | m: dd "use nqp; [nqp::null,44]".EVAL.raku # alas, not the same 18:41:35 camelia | rakudo-moar ea102883d: OUTPUT: «"[Mu, 44]"» 18:48:16 lizmat | raydiak: care to create a problem-solving issue for that ? 18:55:45 raydiak | lizmat: sure thing 18:56:01 lizmat | I guess we need some special value of Mu to indicate a hole 18:56:15 lizmat | similar to IterationEnd 19:05:40 lizmat | Array.List won't make the entries immutable 19:05:58 raydiak | lizmat: that was the first thing I thought, but now I'm not sure an actual value is necessary. we already have :exists for logical tests, and your ",1" proposal gives us a syntax to indicate where something shouldn't :exist 19:06:02 lizmat | Tirifto[m] but why do you want to do that? To make it a value type ? 19:06:39 lizmat | raydiak: I'm pretty sure there have been huge discussions about ,, in lists 19:06:56 lizmat | and I think the consensus was that they should be a syntax error 19:07:10 lizmat | as there is ambiguity 19:07:40 raydiak | do you recall what sort of ambiguity? 19:07:50 lizmat | and lack of clarity, e.g. when the first comma is on the end of a line, and the next line starts with whitespace and a comma 19:08:01 lizmat | that's an easy mistake to make 19:10:11 raydiak | I suppose I could see that, fair enough. I still like ,, more than another value/type/thing, but I'm certainly no language designer 19:12:35 lizmat | it's one of the lessons learned from Perl, like not wanting sigil variance 19:12:54 lizmat | in Perl ,, is just the same as , if I recall correctly 19:14:30 lizmat | otoh, the result of such a discussion could be that existedness of an Array element is just not roundtrippable 19:15:14 lizmat | as supporting a special value would definitely affect parsing performance 19:15:32 lizmat | afk for half an hour or so 19:17:31 raydiak | I definitely agree that ,, shouldn't just collapse to , 19:19:24 raydiak | but I wonder if sparse arrays were considered in the previous discussions. I think using ,, to skip an element is totally reasonable (and probably does exactly that in some other languages, though I can't cite them off the top of my head) 19:23:09 raydiak | as far as just saying "sparse arrays can't round-trip"...I don't like that at all. afaik, .raku is definitely intended to be an EVALable thing to produce an identical data structure, and existence is definitely part of the data structure. otherwise we're basically conceding "Raku doesn't support sparse arrays. Use a hash instead."
Just came across https://docs.rakulang.site/type/Array#method_List which implies that Nil would be the value to use. I'm not sure if I like that, as Nil can be used to mean other things, and someone might want to actually store Nil in an array and later get Nil back out. This does, however, illustrate a need for an actual value to represent the hole, as proposed by @lizmat in the discussion above.
There exists a special constant Empty
which is similar to what you want.
my constant Empty = Slip.new; # The real code is nqp code
Of course it does almost exactly the opposite, but it's existence may be a point to start thinking about.
class Omit is Nil {} # perhaps Any? Mu?
# constant Omit = Any.new; # Mu.new // nqp::create(Nil)
[ 0, 1, Omit, Omit, 5, 6 ]
(Omit is a synonym for skip. We already have .skip
and I didn't want to name clash.)
This would also allow a shorter way to repeat omitted values when there are many of them.
[ 0, 1, |(Omit xx 10), 12, 13 ]
Don't we already have a shorter version than Omit
? AKA Mu
?
You can put instances of Mu, and Mu itself into an array
[ 1, 2, Mu.new, Mu, 3, 4 ]
How can you tell the difference between a value that doesn't exist, and one that happens to be Mu?
There is a difference.
.say for [ 1, 2, Mu.new, Mu, Omit, 3, 4 ]
# 1
# 2
# Mu.new
# Mu
# 3
# 4
The way to deal with that is the same or similar to how we dealt with IterationEnd
, Empty
, or Nil
.
This isn't a full solution, but might point the way towards one.
I've noticed that iterating through a sparse Array and dd
ing each value returns more info than say
ing the .raku
output. For example:
my @a; @a[1..2] = (Any, 42);
for @a { say .raku } # OUTPUT: «Any \n Any \n 42»
for @a { dd $_ } # OUTPUT: «Any @a[0] = Any \n Any @a = Any \n Int @a = 42»
As that shows, this output lets us tell the difference between an empty slot and one with Any
because the empty slots print their index. I checked the dd
source code, and this difference is caused by a call to .VAR.name
, which is different for empty slots vs Any
. Maybe that provides the hook that any changes to .raku
could build on?
@codesections there is no problem recognizing uninitialized slots in a List
. The problem is how to represent that in Raku source code.
I wonder whether using Nil
and special case that in the various .STORE
logics, would be a solution without introducing an additional special value in the language.
@lizmat if I get things right, sparse arrays are backed by List
implementation?
So far, as I consider this thread, the idea of a special type to represent non-existent slots doesn't appeal to me. Any value might be considered as a candidate for a list element.
Though, to be frank, neither I can think of a good syntax construct to solve the problem. It must support two slightly different scenarios one may need:
- 'skip
$num
elements, then insert data' - 'insert starting at position
$idx
'
In a pseudo-code, the first one is kinda like (1, 2, * xx $num, 3, 4)
. The second is more like (1, 2, [$idx]=3, 4)
.
So far, as I consider this thread, the idea of a special type to represent non-existent slots doesn't appeal to me. Any value might be considered as a candidate for a list element.
Yes you can put IterationEnd
as a list element.
It breaks a bunch of things, but you can put it as a list element.
If the main concern is to enable round-tripping, then I'd propose the raku
method:
- Goes through the values, checks if they exist, and if not, remembers that index and inserts some value (can be
Mu
) - If there are no deleted values, just produce the result as today. If there are, then produce something like
(given [1,2,Mu,4, Mu, 6] { .[2,4]:delete; $_ })
It's not a literal syntax, but I'm not sure it's common enough to need one.
I wonder whether using Nil and special case that in the various .STORE logics,
Folks may use Nil
in this context today in (quite reasonable) expectation of getting the default value.
Folks may use Nil in this context today in (quite reasonable) expectation of getting the default value.
But they would! The only difference with the current setup, is that the value of :exists
changes for the elements initialized with Nil
.
The only difference with the current setup, is that the value of
:exists
changes for the elements initialized withNil
.
As far as I understand, not only existence status would change but extra memory allocations would take place too.
It's not a literal syntax, but I'm not sure it's common enough to need one.
It may prove to be useful in implementing some maths. Things like sparse matrices, for example.
If we're focused on roundtripping, there is also the issue that default values don't roundtrip at all:
my @a is default(42); @a[2] = 2; say @a.raku # [42, 42, 2]
my @b = @a.raku.EVAL; @b[5] = 5; say @b.raku; # [42, 42, Any, Any, 5]
say @a.default, @b.default; # 42(Any)
(It's possible that this belongs in a separate issue, but I thought I'd mention it here since it seems relevant to how we solve this)
@codesections I think the handling of is default
in the .raku
representation is a different matter. Related, yes, but different in my view.
I have a proposal which may elegantly solve this problem.
Raku should provide a more generalized literal syntax for arrays etc which uses literal index/value pairs, something like [0:29,1:56,2:-3,100:42]
(illustrative, not my actual proposal) which is analogous to the syntax for hash literals in terms of key/value pairs.
When that syntax is used, it indicates the indexes for the values rather than the indexes being implied by ordinal position of the values in the literal. Then the only elements that exist are the ones explicitly given and any index between the minimum and maximum given which isn't specified doesn't have an element.
For ease of use I would probably say that there should be a common syntax that is a hybrid, a list of elements where each one may optionally specify an index, which is used if present, and the existing behaviour of assigning a sequential index is given to those that don't.
What to do in situations that have both explicit and implicit indexes that seem to overlap is a design decision to make, either implicit ones are all dense from index 0 and an explicit in that range is an error, or the implicit ones skip over explicit ones.
I have actually thought of this before as I have a similar feature in my Muldis Object Notation (MUON) literal syntax.
Following my prior comment for clarity:
-
What Muldis Object Notation has is not sparse arrays but rather dense arrays whose literals have an optional run-length encoding shorthand. So for example
{29,46,-3,0:97,42}
is MUON for a 101 element dense array where elements at indexes 3..99 are each the value zero, so the closest analogy to the suggested Raku example I gave above. And a savvy implementation can store it as such pairs too, so 101 elements takes up the space of 4 actually. Kind of like an ordered Bag can be, then. -
Muldis Object Notation also has tuples which are a hybrid of positional and named elements, for example
("hello",42,this:"that")
so that's 2 positional elements and 1 named element; this is in a sense a direct analogy to, and inspired by, a RakuCapture
; but MUON's tuple is actually entirely named elements and when one uses positional syntax this is shorthand for single character string names where the strings are the Unicode codepoint 0, code point 1, and so on, and in that case the resolution for when things are mixed is directly analogous to what I was proposing for Raku sparse arrays. Eg the above example is the same as(0c0:"hello",0c1:42,"this":"that")
.
An alternative proposal which on one hand seems possibly less elegant but may be a lot easier to implement, is that when one wants to represent a sparse array, they actually use a hash literal combined with a unary function call that takes the hash literal and results in the array such that the hash keys are taken as indexes and the hash values as the element values. Have .perl emit that combination when it has a sparse array. This doesn't require any syntax or parsing changes, only that this mapping function exists in the standard library and .perl uses it. Under this proposal it means the current array syntax is not closed, it can't represent every array, just common cases, which is why I consider it less elegant.
Generally speaking I strongly dislike any solution that uses some kind of special value to stand in for a missing element, as the rest of this thread has discussed. I feel the best way to represent sparse collections is that non-existing elements are ENTIRELY MISSING in literals for them. Same as how hash literal work. Same as how "exists" and "defined" on Perl at least differ in semantics where fetching an element and comparing it to undef loses that information, same as the proposals by others. There is no return value that works properly here. Only an existence test on the collection, or entirely missing elements in the literal, suffices here. Special values is the same as saying we don't REALLY have sparse arrays, just default values for unassigned elements.
Just speculating on general feasibility of the feature.
I traced the logic of creating an array from the grammar down to the Array
class. The good news: perhaps it wouldn't be that hard to have the thing implemented from the point of view of List
/Array
code. Basically, it all winds down to the compiler installing a call to &infix:<,>
and supplying the operator with a statement list as arguments. In its most generic form the operator doesn't care about the element values and binds them into List
's $!reified
as is. It means that in theory one or more of the values can be nqp::null
which is considered a hole by both List
and Array
.
What is not clear to me currently wether nqp::null
can bypass HLL-ization when passed into &infix:<,>
as an argument. Feels to me like it's rather unlikely to happen without special care taken about it.
Also I'm not really sure about List
behavior when it encounters a nqp::null
element in $!reified
. So far, setting one manually haven't resulted in a catastrophe as it gets HLL-ized into a Nil
, but a couple of lines of test code can't serve as a reliable proof.
Anyway, nqp::null
isn't a wanted guest in Raku land.
Another possible path which would get us around any trickery required to pass around nqp::null
in Raku code is to introduce a class akin to Slip
which would receive special treatment, as Slip
does. This approach would still make it possible to bind the class into a list or array elements, or into any other storage if necessary. But otherwise a hypothetical Hole
class would result in what its name suggest – a hole in list:
my @a = 1, 2, Hole.new(2), 5;
say @a[2]:exists; # False
say @a[3]:exists; # False
Perhaps we'd need two kind of holes: fixed length, and positional. The former one is shown above. The latter one would be created as, say, Hole.new(:until(10))
and would mean: create a hole from the current position up to 10th element.
Anyway, if it ever gets to the implementation stage, the syntax behind this functionality remains very much unclear.
As I was requested for a problem-solving ticket on IRC today, I decided to summarize what I have thought out about the Hole
class approach.
Hole
is an entity which, when used in a list context, produces a hole or holes in a list. Technically it means nqp::null
bound into the corresponding position or positions, depending on how Hole
is used. Also it means that in addition to sparse arrays we would get sparse lists too. Technically it is possible to create one now though this would require some mangling with List
implementation details.
Hole
can be used both instantiated or not. When uninstantiated it means just a single element hole. Instantiated it may have either count
or until
attribute set. count
means the number of sequential holes to be inserted into a list; until
means "make as many holes as needed for the next value to appear at until
position":
1, 2, Hole.new(count => 2), 3; # (1, 2, Hole, Hole, 3)
1, 2, Hole.new(until => 5), 3; # 1, 2, Hole, Hole, Hole, 3
Apparently, an array created from a sparse List
will have holes at the same positions as in the list, as shown in the previous comment.
Unless I miss something, the implementation would require modification of infix:<,>
and those parts of Array
responsible for initializing/assigning from lists. The latter concerns me the most because so far as I look into Array
implementation I see iterators are used a lot making it hard to preserve the information about hole positions.
What are the advantages of using Hole
:
- assignment to an array element semantics doesn't change
- memory allocations could be reduced to allocating a single primitive object.
Hole
is not even aList
; most likely it would be aMu
. I.e.Hole.new(count => 1000000)
wouldn't mean allocation of a million-elements longnqp::list
for$!reified
- holes would be easier to spot in explicit array assignments, especially if some good syntax will be invented for them
As to the syntax representation, I still can't find a good solution. So far, I played with something like:
multi prefix:<:*>(Int:D $n) {
Hole.new: count => $n
}
multi prefix:<:^>(Int:D $n) {
Hole.new: until => $n
}
It means that the following example could be used:
my @a = 1, 2, :*2, 3; # two holes
my @a = 1, 2, :^5, 3; # three holes and 3 in 5th position
But it isn't convincing even for myself.
Also, we could give special treatment of Hole
to xx
making Hole xx 1000000
be equivalent to Hole.new(count => 1000000)
.
FWIW, I have just managed to implement all of this using Nil
, except for the case of assigning Nil
to a Int:D
: that will not roundtrip because of the typecheck.
It requires a few tweaks in roast, mostly because they assumed that my @a = Nil
would result in [Any]
rather than [Nil]
.
I'm a bit too tired now to make a good commit message (which this needs), so I will wait until tomorrow before committing.
Even if we don't merge this, most of the code could be used for a Hole
solution, changing the necessary checks for Nil
to Hole
. :-)
See also https://github.com/rakudo/rakudo/issues?q=is%3Aissue+is%3Aopen+hole+in%3Atitle