problem-solving icon indicating copy to clipboard operation
problem-solving copied to clipboard

No literal syntax for sparse arrays

Open raydiak opened this issue 3 years ago • 24 comments

There is no syntax for representing sparse arrays without constructing them procedurally. Because of this, .raku completely discards :exists/:!exists information, generating a contiguous array full of default values which do exist at the holes, instead of a sparse array. Obviously this means round-trippability of .raku/EVAL is currently broken for sparse arrays.

raydiak avatar Apr 24 '21 19:04 raydiak

#raku log from Apr 24, 2021:

16:46:19 raydiak | m: use MONKEY-SEE-NO-EVAL; my @a; @a[1] = 1; say @a.raku; say @a[0]:exists; my @b := EVAL @a.raku; say @b.raku; say @b[0]:exists; 16:46:19 camelia | rakudo-moar ea102883d: OUTPUT: «[Any, 1]␤False␤[Any, 1]␤True␤» 16:54:24 raydiak | not only does .raku throw away sparse array information, I don't think we even have a syntax which could properly represent it as an expression without a do block. list assignment to a slice could work to construct it, but as an expression it would return the list of values on the right of the assignment instead of the sparse array, so you'd still have to wrap it in a do block so you can name the array and 16:54:24 raydiak | return it without exposing the temporary name to the surrounding context 18:40:48 lizmat | raydiak: [,1] could be such a thing I guess 18:41:35 lizmat | m: dd "use nqp; [nqp::null,44]".EVAL.raku # alas, not the same 18:41:35 camelia | rakudo-moar ea102883d: OUTPUT: «"[Mu, 44]"␤» 18:48:16 lizmat | raydiak: care to create a problem-solving issue for that ? 18:55:45 raydiak | lizmat: sure thing 18:56:01 lizmat | I guess we need some special value of Mu to indicate a hole 18:56:15 lizmat | similar to IterationEnd 19:05:40 lizmat | Array.List won't make the entries immutable 19:05:58 raydiak | lizmat: that was the first thing I thought, but now I'm not sure an actual value is necessary. we already have :exists for logical tests, and your ",1" proposal gives us a syntax to indicate where something shouldn't :exist 19:06:02 lizmat | Tirifto[m] but why do you want to do that? To make it a value type ? 19:06:39 lizmat | raydiak: I'm pretty sure there have been huge discussions about ,, in lists 19:06:56 lizmat | and I think the consensus was that they should be a syntax error 19:07:10 lizmat | as there is ambiguity 19:07:40 raydiak | do you recall what sort of ambiguity? 19:07:50 lizmat | and lack of clarity, e.g. when the first comma is on the end of a line, and the next line starts with whitespace and a comma 19:08:01 lizmat | that's an easy mistake to make 19:10:11 raydiak | I suppose I could see that, fair enough. I still like ,, more than another value/type/thing, but I'm certainly no language designer 19:12:35 lizmat | it's one of the lessons learned from Perl, like not wanting sigil variance 19:12:54 lizmat | in Perl ,, is just the same as , if I recall correctly 19:14:30 lizmat | otoh, the result of such a discussion could be that existedness of an Array element is just not roundtrippable 19:15:14 lizmat | as supporting a special value would definitely affect parsing performance 19:15:32 lizmat | afk for half an hour or so 19:17:31 raydiak | I definitely agree that ,, shouldn't just collapse to , 19:19:24 raydiak | but I wonder if sparse arrays were considered in the previous discussions. I think using ,, to skip an element is totally reasonable (and probably does exactly that in some other languages, though I can't cite them off the top of my head) 19:23:09 raydiak | as far as just saying "sparse arrays can't round-trip"...I don't like that at all. afaik, .raku is definitely intended to be an EVALable thing to produce an identical data structure, and existence is definitely part of the data structure. otherwise we're basically conceding "Raku doesn't support sparse arrays. Use a hash instead."

raydiak avatar Apr 24 '21 19:04 raydiak

Just came across https://docs.rakulang.site/type/Array#method_List which implies that Nil would be the value to use. I'm not sure if I like that, as Nil can be used to mean other things, and someone might want to actually store Nil in an array and later get Nil back out. This does, however, illustrate a need for an actual value to represent the hole, as proposed by @lizmat in the discussion above.

raydiak avatar Apr 24 '21 19:04 raydiak

There exists a special constant Empty which is similar to what you want.

my constant Empty = Slip.new; # The real code is nqp code

Of course it does almost exactly the opposite, but it's existence may be a point to start thinking about.

class Omit is Nil {} # perhaps Any? Mu?
# constant Omit = Any.new; # Mu.new // nqp::create(Nil)

[ 0, 1, Omit, Omit, 5, 6 ]

(Omit is a synonym for skip. We already have .skip and I didn't want to name clash.)

This would also allow a shorter way to repeat omitted values when there are many of them.

[ 0, 1, |(Omit xx 10), 12, 13 ]

b2gills avatar Jun 08 '21 03:06 b2gills

Don't we already have a shorter version than Omit? AKA Mu ?

lizmat avatar Jun 08 '21 08:06 lizmat

You can put instances of Mu, and Mu itself into an array

[ 1, 2, Mu.new, Mu, 3, 4 ]

How can you tell the difference between a value that doesn't exist, and one that happens to be Mu?
There is a difference.

.say for [ 1, 2, Mu.new, Mu, Omit, 3, 4 ]
# 1
# 2
# Mu.new
# Mu
# 3
# 4

The way to deal with that is the same or similar to how we dealt with IterationEnd, Empty, or Nil.

b2gills avatar Jun 08 '21 13:06 b2gills

This isn't a full solution, but might point the way towards one.

I've noticed that iterating through a sparse Array and dding each value returns more info than saying the .raku output. For example:

my @a; @a[1..2] = (Any, 42);
for @a { say .raku } # OUTPUT: «Any \n Any \n 42»
for @a { dd $_ }    # OUTPUT: «Any @a[0] = Any \n Any @a = Any \n Int @a = 42»

As that shows, this output lets us tell the difference between an empty slot and one with Any because the empty slots print their index. I checked the dd source code, and this difference is caused by a call to .VAR.name, which is different for empty slots vs Any. Maybe that provides the hook that any changes to .raku could build on?

codesections avatar Jun 08 '21 14:06 codesections

@codesections there is no problem recognizing uninitialized slots in a List. The problem is how to represent that in Raku source code.

lizmat avatar Jun 08 '21 14:06 lizmat

I wonder whether using Nil and special case that in the various .STORE logics, would be a solution without introducing an additional special value in the language.

lizmat avatar Jun 08 '21 14:06 lizmat

@lizmat if I get things right, sparse arrays are backed by List implementation?

So far, as I consider this thread, the idea of a special type to represent non-existent slots doesn't appeal to me. Any value might be considered as a candidate for a list element.

Though, to be frank, neither I can think of a good syntax construct to solve the problem. It must support two slightly different scenarios one may need:

  • 'skip $num elements, then insert data'
  • 'insert starting at position $idx'

In a pseudo-code, the first one is kinda like (1, 2, * xx $num, 3, 4). The second is more like (1, 2, [$idx]=3, 4).

vrurg avatar Jun 08 '21 15:06 vrurg

So far, as I consider this thread, the idea of a special type to represent non-existent slots doesn't appeal to me. Any value might be considered as a candidate for a list element.

Yes you can put IterationEnd as a list element.

It breaks a bunch of things, but you can put it as a list element.

b2gills avatar Jun 08 '21 16:06 b2gills

If the main concern is to enable round-tripping, then I'd propose the raku method:

  1. Goes through the values, checks if they exist, and if not, remembers that index and inserts some value (can be Mu)
  2. If there are no deleted values, just produce the result as today. If there are, then produce something like (given [1,2,Mu,4, Mu, 6] { .[2,4]:delete; $_ })

It's not a literal syntax, but I'm not sure it's common enough to need one.

jnthn avatar Jun 08 '21 16:06 jnthn

I wonder whether using Nil and special case that in the various .STORE logics,

Folks may use Nil in this context today in (quite reasonable) expectation of getting the default value.

jnthn avatar Jun 08 '21 16:06 jnthn

Folks may use Nil in this context today in (quite reasonable) expectation of getting the default value.

But they would! The only difference with the current setup, is that the value of :exists changes for the elements initialized with Nil.

lizmat avatar Jun 08 '21 19:06 lizmat

The only difference with the current setup, is that the value of :exists changes for the elements initialized with Nil.

As far as I understand, not only existence status would change but extra memory allocations would take place too.

It's not a literal syntax, but I'm not sure it's common enough to need one.

It may prove to be useful in implementing some maths. Things like sparse matrices, for example.

vrurg avatar Jun 08 '21 19:06 vrurg

If we're focused on roundtripping, there is also the issue that default values don't roundtrip at all:

my @a is default(42); @a[2] = 2; say @a.raku # [42, 42, 2]
my @b = @a.raku.EVAL; @b[5] = 5; say @b.raku; # [42, 42, Any, Any, 5]
say @a.default, @b.default; # 42(Any)

(It's possible that this belongs in a separate issue, but I thought I'd mention it here since it seems relevant to how we solve this)

codesections avatar Jun 08 '21 20:06 codesections

@codesections I think the handling of is default in the .raku representation is a different matter. Related, yes, but different in my view.

lizmat avatar Jun 08 '21 20:06 lizmat

I have a proposal which may elegantly solve this problem.

Raku should provide a more generalized literal syntax for arrays etc which uses literal index/value pairs, something like [0:29,1:56,2:-3,100:42] (illustrative, not my actual proposal) which is analogous to the syntax for hash literals in terms of key/value pairs.

When that syntax is used, it indicates the indexes for the values rather than the indexes being implied by ordinal position of the values in the literal. Then the only elements that exist are the ones explicitly given and any index between the minimum and maximum given which isn't specified doesn't have an element.

For ease of use I would probably say that there should be a common syntax that is a hybrid, a list of elements where each one may optionally specify an index, which is used if present, and the existing behaviour of assigning a sequential index is given to those that don't.

What to do in situations that have both explicit and implicit indexes that seem to overlap is a design decision to make, either implicit ones are all dense from index 0 and an explicit in that range is an error, or the implicit ones skip over explicit ones.

I have actually thought of this before as I have a similar feature in my Muldis Object Notation (MUON) literal syntax.

duncand avatar Jun 09 '21 08:06 duncand

Following my prior comment for clarity:

  1. What Muldis Object Notation has is not sparse arrays but rather dense arrays whose literals have an optional run-length encoding shorthand. So for example {29,46,-3,0:97,42} is MUON for a 101 element dense array where elements at indexes 3..99 are each the value zero, so the closest analogy to the suggested Raku example I gave above. And a savvy implementation can store it as such pairs too, so 101 elements takes up the space of 4 actually. Kind of like an ordered Bag can be, then.

  2. Muldis Object Notation also has tuples which are a hybrid of positional and named elements, for example ("hello",42,this:"that") so that's 2 positional elements and 1 named element; this is in a sense a direct analogy to, and inspired by, a Raku Capture; but MUON's tuple is actually entirely named elements and when one uses positional syntax this is shorthand for single character string names where the strings are the Unicode codepoint 0, code point 1, and so on, and in that case the resolution for when things are mixed is directly analogous to what I was proposing for Raku sparse arrays. Eg the above example is the same as (0c0:"hello",0c1:42,"this":"that").

duncand avatar Jun 09 '21 08:06 duncand

An alternative proposal which on one hand seems possibly less elegant but may be a lot easier to implement, is that when one wants to represent a sparse array, they actually use a hash literal combined with a unary function call that takes the hash literal and results in the array such that the hash keys are taken as indexes and the hash values as the element values. Have .perl emit that combination when it has a sparse array. This doesn't require any syntax or parsing changes, only that this mapping function exists in the standard library and .perl uses it. Under this proposal it means the current array syntax is not closed, it can't represent every array, just common cases, which is why I consider it less elegant.

duncand avatar Jun 09 '21 09:06 duncand

Generally speaking I strongly dislike any solution that uses some kind of special value to stand in for a missing element, as the rest of this thread has discussed. I feel the best way to represent sparse collections is that non-existing elements are ENTIRELY MISSING in literals for them. Same as how hash literal work. Same as how "exists" and "defined" on Perl at least differ in semantics where fetching an element and comparing it to undef loses that information, same as the proposals by others. There is no return value that works properly here. Only an existence test on the collection, or entirely missing elements in the literal, suffices here. Special values is the same as saying we don't REALLY have sparse arrays, just default values for unassigned elements.

duncand avatar Jun 09 '21 09:06 duncand

Just speculating on general feasibility of the feature.

I traced the logic of creating an array from the grammar down to the Array class. The good news: perhaps it wouldn't be that hard to have the thing implemented from the point of view of List/Array code. Basically, it all winds down to the compiler installing a call to &infix:<,> and supplying the operator with a statement list as arguments. In its most generic form the operator doesn't care about the element values and binds them into List's $!reified as is. It means that in theory one or more of the values can be nqp::null which is considered a hole by both List and Array.

What is not clear to me currently wether nqp::null can bypass HLL-ization when passed into &infix:<,> as an argument. Feels to me like it's rather unlikely to happen without special care taken about it.

Also I'm not really sure about List behavior when it encounters a nqp::null element in $!reified. So far, setting one manually haven't resulted in a catastrophe as it gets HLL-ized into a Nil, but a couple of lines of test code can't serve as a reliable proof.

Anyway, nqp::null isn't a wanted guest in Raku land.

Another possible path which would get us around any trickery required to pass around nqp::null in Raku code is to introduce a class akin to Slip which would receive special treatment, as Slip does. This approach would still make it possible to bind the class into a list or array elements, or into any other storage if necessary. But otherwise a hypothetical Hole class would result in what its name suggest – a hole in list:

my @a = 1, 2, Hole.new(2), 5;
say @a[2]:exists; # False
say @a[3]:exists; # False

Perhaps we'd need two kind of holes: fixed length, and positional. The former one is shown above. The latter one would be created as, say, Hole.new(:until(10)) and would mean: create a hole from the current position up to 10th element.

Anyway, if it ever gets to the implementation stage, the syntax behind this functionality remains very much unclear.

vrurg avatar Jun 10 '21 02:06 vrurg

As I was requested for a problem-solving ticket on IRC today, I decided to summarize what I have thought out about the Hole class approach.

Hole is an entity which, when used in a list context, produces a hole or holes in a list. Technically it means nqp::null bound into the corresponding position or positions, depending on how Hole is used. Also it means that in addition to sparse arrays we would get sparse lists too. Technically it is possible to create one now though this would require some mangling with List implementation details.

Hole can be used both instantiated or not. When uninstantiated it means just a single element hole. Instantiated it may have either count or until attribute set. count means the number of sequential holes to be inserted into a list; until means "make as many holes as needed for the next value to appear at until position":

1, 2, Hole.new(count => 2), 3; # (1, 2, Hole, Hole, 3)
1, 2, Hole.new(until => 5), 3; # 1, 2, Hole, Hole, Hole, 3

Apparently, an array created from a sparse List will have holes at the same positions as in the list, as shown in the previous comment.

Unless I miss something, the implementation would require modification of infix:<,> and those parts of Array responsible for initializing/assigning from lists. The latter concerns me the most because so far as I look into Array implementation I see iterators are used a lot making it hard to preserve the information about hole positions.

What are the advantages of using Hole:

  • assignment to an array element semantics doesn't change
  • memory allocations could be reduced to allocating a single primitive object. Hole is not even a List; most likely it would be a Mu. I.e. Hole.new(count => 1000000) wouldn't mean allocation of a million-elements long nqp::list for $!reified
  • holes would be easier to spot in explicit array assignments, especially if some good syntax will be invented for them

As to the syntax representation, I still can't find a good solution. So far, I played with something like:

multi prefix:<:*>(Int:D $n) {
    Hole.new: count => $n
}
multi prefix:<:^>(Int:D $n) {
    Hole.new: until => $n
}

It means that the following example could be used:

my @a = 1, 2, :*2, 3; # two holes
my @a = 1, 2, :^5, 3; # three holes and 3 in 5th position

But it isn't convincing even for myself.

Also, we could give special treatment of Hole to xx making Hole xx 1000000 be equivalent to Hole.new(count => 1000000).

vrurg avatar Jun 14 '21 21:06 vrurg

FWIW, I have just managed to implement all of this using Nil, except for the case of assigning Nil to a Int:D: that will not roundtrip because of the typecheck.

It requires a few tweaks in roast, mostly because they assumed that my @a = Nil would result in [Any] rather than [Nil].

I'm a bit too tired now to make a good commit message (which this needs), so I will wait until tomorrow before committing.

Even if we don't merge this, most of the code could be used for a Hole solution, changing the necessary checks for Nil to Hole. :-)

lizmat avatar Jun 14 '21 21:06 lizmat

See also https://github.com/rakudo/rakudo/issues?q=is%3Aissue+is%3Aopen+hole+in%3Atitle

raiph avatar Aug 22 '21 22:08 raiph