problem-solving
problem-solving copied to clipboard
Behavior of type-constrained parameters is surprising given other behavior
Minimal Example
sub foo(Int @a) { dd @a; }
foo [1, 2, 3];
# Type check failed in binding to parameter '@a'; expected Positional[Int] but got Array ([1, 2, 3])
Why This is Surprising
Other behavior in Raku would lead one to believe that type constraints apply not to the container (variable) as a whole, but rather to the element(s) that can be put into our pulled out of the container, the exact meaning of which depends on the type of said container.
Specifically, we do not constrain an array to only hold Ints with Array[Int] @a, rather we do Int @a. We do not constrain a hash to hold only Ints with Hash[Int] %h, rather we do Int %h. We do not constrain a callable to return only Ints with Callable[Int] &c, rather we do Int &c. All this implies to the newly-learning Raku developer that constraints apply "intelligently" based on container shape, rather than "dumbly" to the container as a whole. In other words, the user doesn't have to worry about typing whole containers as long as the contained elements satisfy the constraint.
This is further reinforced by the fact that literals with no explicit type specified can be assigned into type-constrained containers, e.g. my Int @ints = [1, 2, 3]. Nowhere does the programmer explicitly say that [1, 2, 3] is Array[Int], yet the language accepts it into an Int-constrained @-sigiled container. Why, then, should the Minimal Example fail?
Should This Behavior Change?
More experienced developers understand that the underlying difference is that between assignment (=) and binding (:=), and what happens when calling a code block is binding to parameters, not assigning. While that suffices as a technical explanation, it does not feel sufficient as a philosophical justification for the surprising behavior. I (and I believe others too) would like to be able to apply the "intelligent constraint" principle consistently, or at least in the absence of something "unusual" like my @a is Array[Foo].
Furthermore, it seems to violate the concept of optional, gradual typing. Going back and adding type constraints to your quickly-prototyped code should not break it if that constraint was always satisfied anyway. It feels wrong that the Minimal Example could be made to work, with equivalent function, by removing the type constraint on foo's parameter.
Practical Considerations
Making parameter binding behave as naïvely expected is easy enough for cases where the container's values are all known at compile time. However, in the general case, we would likely have to scan through arrays and hashes, and I'm not even sure what sort of black magic would be necessary for callables (perhaps those are an acceptable exception where the user must explicitly call out the return type).
here's my proposal:
let's extend the way that the array literal works
[1,2,3].Int gives Array[Int]
sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int; # works
foo [1, 2, 3]; # fails
also we would need
my @a = [1,2,3].Int;
@a ~~ Array[Int]; #True
... then we will need to update the docs and error messages so that this is obvious to a newbie
Edit:
and the hash literals
{:a(1),:b(2),:c(3)]}.Int gives Hash[Int]
%(:a(1),:b(2),:c(3)]).Int gives Hash[Int]
@librasteve couldn't that break code relying on the convention that the numification of an array is its number of elements? Granted, numification and intification are two different things, but I think it would be surprising for +[4, 5, 6] to yield 3 while [4, 5, 6].Int yields Array[Int].new(4, 5, 6)
yes - I don't see anything here that could be done non-breaking change
the benefit of my proposal is that this is only for the Array and Hash literals - while there possibly is code out there with [1,2,3].Int and so on, I don't see that idiom as being widely used
would definitely go for a non-breaking change if one can be found
Sigils are shortcuts for type constraints (and default values). @ means Positional (default value Array). Int @ means Positional[Int] (default value Array[Int]). They are roughly equivalent. Thus sub foo(Int @a) means pretty much the same as sub foo(Positional[Int] $a). Thus while at first glance you may think that Int @a is a type constraint on an array's values, it really is a type constraint on the array variable itself. It means that @a holds an object that does the Positional role, parameterized by Int or in other words: an array-like thing that by definition can only hold Ints. We thus can rely on that object to have done any appropriate type checks on its members when they get stored into it. I.e. I don't have to check whether a Positional[Int] really only contains Ints.
An important difference between assignment and binding is that when assigning containers, we always use copy-semantics. Remember, we put the responsibility of the type checks on the container when elements are stored and my Int @a = [...] is really a call to @a.STORE([...]). Thus @a has to type-check all supplied values. The flip side is that for copying we have to touch all those elements anyway and thus the type checking is almost free.
Binding on the other hand is performance sensitive. You probably and I for sure wouldn't want Raku to iterate over all arrays passed to any function. That would be hugely expensive. Thus we have to rely on the container type to avoid that cost.
Lastly if we want to extend the simplification we have for variable initializers to routine calls, that would have severe restrictions. We would only be able to do this for subs, not method calls as the latter are late-bound, i.e. we cannot know at compile time, which code object will be called for that method. Thus we cannot know its signature and will not know whether we need to apply the special case or not. This will fix one surprise by introducing another. Suddenly something that works just fine for subs will fail when you turn that sub into a method.
As to the suggestion of adding special method-like syntax for postfix type-specifications: the example with Int doesn't look too bad and you already acknowledged that this would be a breaking change as .Int right now would give you the number of elements. However you would have to do this not just for Int but for any type, at least those types that can be used for literals. [1, 2, 3].Bool would suddenly no longer return True. Even worse, [1, 2, 3].Str would no longer be 1 2 3. Except of course when you first put it into a variable and then call .Str which is even more surprising.
Compared to all that, is Array[Int](1, 2, 3) really all that bad?
I realize that from a technical standpoint, everything you said in the first paragraph is true, but something about it just doesn't sit right with me conceptually. Maybe it's because I'm still clutching my Perls, so to speak, but it seems like sigils ought to be more than pure syntax sugar. Regardless, being unable to pass what is clearly an array/hash/etc. of Foos into a routine that says it takes an array/hash/etc. of Foos is unexpected, frustrating, and discourages use of the optional typing feature.
Would type-checking each element be the end of the world performance-wise? I genuinely don't know, but I'm sure there are optimizations that could help skip such a check in many cases (e.g. containers can keep up with what's been put in them, to whatever degree of accuracy yields the best benefit-to-annoyance ratio). If performance becomes an issue, the user can specify types on the calling side to avoid that iterative check. This is what we do with decimal numbers - the additional performance of floating points is opt-in because rationals DWIM better.
I'm not sure I follow on the issue of special-casing for subs vs. methods. Why does this need to be a special case? Wouldn't making it so that e.g. [1, 2, 3] ~~ Array[Int] be sufficient? Already we can't even know which sub candidate will be called at compile time because of where clauses and subsets.
@niner - thanks for your feedback, I can see that my proposal will not fly
TIL that Array[Int](1,2,3) is a thing
I am now drawn to [1,2,3].typed as a synonym for Array[*.are](1,2,3) [pseudocode]
There is a problem with binding and assignment with @ and %-sigiled containers. Please consider the following code.,
sub foo(Int() @ints) { dd @ints };
foo [1,2,3];
This should DWIM but is NYI.
In Raku we use type constraints to limit the consideration of the implementor of a Routine. This is not the default because it limits composebility. And we don't like that. I too made the observation that Raku-beginners don't understand the syntax of type constraints on collection containers. Most of them appear to reuse lessens they learned with statically typed languages. Raku does not want to be one of those languages.
Indeed, type constraints on collections where ment to allow optimisations. Rakudo does not take advantage of that right now. But be careful what you wish for. Modern C++ provides excellent support for typed containers. If you consider having to spend 3 lines of code to specify a single arguments type as excellent.
@gfldex could you explain what you mean by limiting composability?
I'm also not sure how replacing Int with Int() changes anything related to this problem. If the compiler rejects passing an untyped array/list consisting of Ints into a Positional[Int] parameter, why would it allow passing it into a Positional[Int()] parameter? I would expect putting a coercion type on an @-sigiled parameter to allow individual elements to coerce, which is not the same thing as truly restricting the type of each element, nor the same thing as having the whole container coerce (i.e. Positional[Int]()). In other words, I would expect the following...
multi foo(Int @bar) { say 'Candidate I'; }
multi foo(Int() @bar) { say 'Candidate II'; }
multi foo(@bar) { say 'Candidate III'; }
class NoIntCoerce { }
foo (1, 2, 3); # Candidate I
foo ('1', '2', '3'); # Candidate II (Strs coerce to Ints)
foo (NoIntCoerce.new xx 3); # Candidate III
Regardless, being unable to pass what is clearly an array/hash/etc. of Foos into a routine that says it takes an array/hash/etc. of Foos is unexpected, frustrating, and discourages use of the optional typing feature.
I am actually pretty n00b at using Raku's type constraints, but I've spent most of my life in statically typed languages like Go and Rust. I have experienced this "discouragement" first hand, maybe even with an example exactly like this.
I want to be able to start with something untyped and change the function signature later on to add constraints. Here is a contrived TypeScript example.
// Untyped
function sumTheNums(nums) {
let sum = 0;
for (let i = 0; i < nums.length; i++) {
sum += nums[i];
}
return sum;
}
// Okay it's an array of numbers
function sumTheNums2(nums: Array<number>) {
return sumTheNums(nums);
}
// Well, sometimes i want to concatenate strings for some reason, too
function sumTheNums3(nums: Array<number | string>) {
return sumTheNums(nums);
}
let result = sumTheNums( [1, 2, 3, 4]);
result = sumTheNums2([1, 2, 3, 4]);
result = sumTheNums3(['1', '2', '3', '4']);
console.log(result);
I don't want to change the call sites. There could be a lot of those.
Anyways, to me it feels like the original example should work
expected Positional[Int] but got Array ([1, 2, 3]).
If the docs are up to date, then Array is List and List does Positional. I don't know if this is how roles are supposed to work, but to me - coming from these other languages - the Array is a "narrower" type and we should be able to pass it in places like this.
While ideally the
sub foo(Int() @bar) { ... }
will get implemented in a way to allow it to be called with foo [1,2,3], an alternative in the meantime could potentially be an (abuse) of traits. No doubt it hasn't been implemented yet because exactly how it will function is still a bit tbd. I know at one point in time I had done something akin to
multi sub foo( Int @bar ) { ... }
multi sub foo( @bar where *.all ~~ Int ) {
samewith Array[Int].new: @bar
}
The advantage of the multi approach is you get teh speedy if you've properly typed the container, and fall back to molasses if not but without breakage (and the thing would be tightened from the rest of the call chain). I'd imagine via traits somehow we could manage to capture such instances of containers with typed elements and rinse and repeat in someway. Maybe I'll play around with that.
@alabamenhu , I think what you are saying is we check items by default and require a trait to vouch that the checking has been delegated, like this:
multi sub foo( Int @bar is vouched ) { #<== same as :( Array[Int] $bar )
...
}
multi sub foo( Int @bar ) { #<== used to be :( @bar where *.all ~~ Int )
samewith Array[Int].new: @bar
}
multi sub foo( Int() @bar ) { #<== same as :( @bar where ! *.all ~~ Int )
samewith @bar
}
- where a new trait eg.
is vouchedcan be applied if you want the fast version - your coercion example would match the third case case in the multi
- this is not a breaking change, but may negatively impact performance for larger Arrays
if so, I support it
An array cannot be constrained to Int in Raku.
AUTHOR Mea Culpa: @raiph refutes the above assertion and he appears to be correct. Early on I tried 'Typing' arrays constrained to Int and failed, so I assumed it couldn't be done (maybe I tripped across the same issue as @landyacht ?). Cheers.
TL;DR Summary of vector/Types in the R-programming language, below:
In other languages, strict Type-specificity applies. For example, the R-programming language has no scalars but only vectors (what other languages call a scalar is simply a vector of length == 1 in R).
However vectors in R MUST consist of a single Type, i.e. all character (i.e. string) elements, or all Integer elements, or all Numeric (i.e. Real) elements, or all Factors (memory-saving mechanism in R for dealing with Grouped variables). But never a mixture of Types (sometimes referred to as mode() or storage.mode() in R).
This is consistent with R's use as a statistical language: you'll have columns of experimental parameters (e.g. name, date, age) and columns of data (i.e. numeric). A user may manually convert a vector with something like as.numeric and/or as. character. To facilitate this mechanism, there are canonical coercion mechanisms which are invoked automatically, for example when a string is combined/added-to a vector of Ints:
https://www.r-bloggers.com/2013/09/type-conversion-and-you-or-and-r/
[ If you need a mixture of Types in your data-object, you advance to the next-most-complex data-object, which is a List in the R-programming language. In fact, two-dimensional dataframes in R are nothing more than Lists constrained to consist of equal-length vector elements ( i.e. rows with equal number of columns) ].
Raku decided that Arrays can contain multiple Types. So you need to either:
- Set up a mechanism whereby all array elements can be constrained to a single Type (e.g.
Int), OR - Set up a mechanism whereby positions can be "mapped-over" (e.g. using
rotor/batch) with a user-supplied list of Types. For example a repeating pattern ofInt,Str,Numcan map-over repeating segments of@aarray (analogous to a three column table). OR - BOTH.
@thoughtstream @pmichaud @TimToady
I personally like librasteve's latest proposal. IMO, the Raku Way is that user convenience should be default, and performance that interferes with convenience should be opt-in. The is vouched (I'm not sure about the verbiage, but I can't think of anything better of a reasonable length) trait providing a way to still have the @ sigil on something that would otherwise be a Positional inside a Scalar container is something I hadn't thought of, and I definitely agree with the underlying implication that having a typed Scalar is the right way to indicate the user's intent for stricter typing (i.e. to get what is currently the behavior of Type @foo).
Regarding performance, I believe we can cover the vast majority of likely cases with some relatively simple optimizations. The trickiest part would be a good "tightest common type" operator that handles smilies (:D, :U), special types like Failure, and the fact that types can do multiple roles. Maybe two related operators—one to walk up the class chain (returning a single value) and another up the role chain (returning one or more)—would be better. At any rate, my idea is that containers like Array, List, Hash, etc. can keep up with what's been put in them; or, more specifically, the tightest common type of what's been put in so far.
The above approach alone would lose specificity (but not correctness) if elements get removed, but I think we could partially combat that while staying in O(1) space by tracking a bit more information. I won't overplay my hand here, though...
The odds of something being both a large and highly heterogeneous array are very small IMO, so I think this relatively simple optimization will go a long way to preventing full scans of giant arrays and hashes.
@landyacht I appreciate your commentary on Typescript. It's important to look at other languages to see what's currently resonating with programmers.
Someone named Larry Wall stated (paraphrasing) with regards to the Perl6/Raku re-write, "We've decided that it's better to change the language than change the user."
@jubilatious1
An array cannot be constrained to
Intin Raku.
That's... quite a misunderstanding, to put it mildly, and needs to be addressed in case anyone who doesn't know Raku thinks your statement is correct. I'll try to ground understanding via code anyone can run to see that Raku very definitely allows arrays to be constrained to Int:
# Here's an @array whose elements are constrained to `Int`:
my Int @IntArray;
# Readers of the code know it, but so does Raku:
say @IntArray .of; # (Int)
# Here's an _assignment_ throwing an exception due to violating the constraint:
(try @IntArray = 1, 2, '3') // say $!; # Error ... expected Int but got Str ("3")
# Here's an assignment _succeeding_ by adhering to the constraint:
(try @IntArray = 1, 2, 3) andthen .say; # [1 2 3]
# Here's a function with taking an array whose elements are constrained to `Int`:
sub foo (Int @IntArray) { 'successfully bound' }
# Here's a _binding_ that succeeds by adhering to the constraint:
(try foo @IntArray) andthen .say; # successfully bound
# Here's a _binding_ that throws an exception due to violating the constraint:
(try foo [1,2,3]) // say $!; # expected Positional[Int] but got Array ([1, 2, 3])
Presumably all of this code has surprised you but it's been as above since day one.
PS. Imo the error message in the last one gets to the heart of what is really great about Raku related to this, and what sucks.
@raiph states:
That's... quite a misunderstanding ... .
Not sure then, why the example of @librasteve fails?
sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int; # works
foo [1, 2, 3]; # fails
Actually, I see both of @librasteve 's examples failing, but maybe because I'm on an older Rakudo (2023.05)?
~$ raku -e 'sub foo(Int @a) { dd @a; }; foo [1, 2, 3].Int;'
Type check failed in binding to parameter '@a'; expected Positional[Int] but got Int (3)
in sub foo at -e line 1
in block <unit> at -e line 1
hi @jubilatious1
today, this is a proposal, and it does not work
sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int; # fails
foo [1, 2, 3]; # fails
My first proposal was to add syntax so that .Int on an Array will coerce it to an Array[Int]. I did not say what to do about the elems, but presumably this would scan the Array and coerce each elem to Int or fail. I still quite like this proposal, but it is unrelated to my latest proposal ;-)
It is interesting to hear you description of R - since imo raku is trying to do all the things, it can have Array[Int] to strictly control the contents of your array and it can have Array which is a dynamic ragbag of any type you like. Array is a Positional set of Scalar containers, so in the dynamic case, each element's container knows the type of its contents (if any). In a typed Array all the types of all the Scalar containers of all the elements must match the Array type.
SO, I think the debate is about when and where the conformance of the Scalar container elements is checked, Today that is when elements are added to the Array, and we trust a typed Array like Array[Int] to police its elements and only allow Int elements in. Which, as I understand it, is pretty R-like.
I think that Rog is saying that we should also allow an untyped Array to be passed to a sub with a signature like :( Int @a ) provided that its elements are all Ints at that time (ie a runtime check).
@landyacht --- yeah, maybe is delegated beats is vouched verbiage-wise
@librasteve thanks.
This doesn't work either, although your DWIM and my DWIM may not be in full agreement:
~$ raku -e 'sub foo( @a[Int,Str] ) { dd @a; }; foo [1, "A"];'
Constraint type check failed in binding to parameter '@a'; expected anonymous constraint to be met but got Array ([1, "A"])
in sub foo at -e line 1
in block <unit> at -e line 1
sub foo( Int $i, Str $s ) { 'yo' };
foo |[1, "A"]; #yo
^^ to start with , this does work (ie a signature can be defined to decompose an array and control the types)
at the moment, raku does not consider repeating patterns of types in a signature (afaik)
but we do have itemization and multi-dimensional arrays, which feature prominently and often folks ask why this "overhead" --- that mechanism is there to recursively unpack patterns in any hierarchy of list or map or item
This also works:
sub foo( [ Int $i, Str $s ] ) { 'yo' };
foo [1, "A"]; #yo
sub foo( @a[Int,Str] ) { dd @a; }; foo [1, "A"]
If you add a space between the @a and the [...], you kinda get what you want (if I got it right) https://glot.io/snippets/gvymdn83cz
But I also think maybe it would make sense to extend Lizmat's Tuple to work like that (Tuple[Int, Str]). But I'm only thinking out loud...
Thanks @FCO, for trying it out. Really...I was only off-by-one-space? Amazing.
Also, I tried doing Capture and Signature to get more info, but no dice:
~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Capture }; foo |[1, "A"];'
\()
~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Signature }; foo |[1, "A"];'
No such method 'Signature' for invocant of type 'Any'
in sub foo at -e line 1
in block <unit> at -e line 1
~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Signature }; foo |[1, "A"];'
No such method 'Signature' for invocant of type 'Any'
in sub foo at -e line 1
in block <unit> at -e line 1
~$ raku -e 'sub foo( Int $i, Str $s ) { say .Capture }; foo |[1, "A"];'
\()
Possibly relevant:
"Inconsistensy of container descriptor default value type for nominalizable types." #3
"Make subsets validate against their constraints same way as definites do." https://github.com/rakudo/rakudo/pull/2946
The more I think about this, the more this feels like an ENODOC instead of an ENODESIGN.
my Int @a = Array[Int].new: 1,2,3,Int;
dd @a;
my @b := Array[Int].new: 1,2,3;
dd @b;
sub foo(Int @c, Int @d) { dd @c }
foo @a, @b;
This just works and is quite readable.
However, there is a Rakudobug because the following also works.
my Int @a = Array[Int].new: 1,2,3,Int;
sub foo(Int:D @b) { }
foo @a;
Whenever I need to pass in a typed array what I usually do is:
foo( my Int @ = [1,2,3] );
It's a bit cumbersome, but considering that it is needed once in thousands of lines of code – no big deal with it. But I do understand that it could be used more often with math algorithms.
Let's have a look at this from another perspective. I didn't have time to thoroughly read through the entire discussion here, but so far nobody mentioned hashes. And, yet, the same WAT exists for them too. Declaring a typed hash is even bit more complicated, than declaring an array.
Interesting that strictly typed languages were mentioned but nobody thought about as syntax as a possible option. It's an idea I've got a few minutes ago therefore it's very raw and poorly thought-out. But something like might be an option:
foo [1,2] as Array[Int];
foo [1,2] as Int @;
bar { 12 => 1.2, 43 => 4 } as Hash[Rat, Int];
bar { 12 => 1.2, 43 => 4 } as Rat %{Int};
What's good about it is that it allows the compiler to create correct constant object at compile time, avoiding run-time overheads. For non-constant objects it would wind down to a run-time coercion case which is less interesting and only makes sense for providing DWIM behaivor.
Apparently, the syntax would also allow to avoid run-time coercion with any other, potentially constant, object.
@raiph:
my Int @a;
@a.push: "cat";
say @a;
# Errors:
Type check failed for an element of @a; expected Int but got Str ("cat")
in block <unit> at <unknown file> line 1
But:
my Int @b;
@b.push: Int;
say @b;
# Returns:
[(Int)]
my Int @c;
@c.push: Int;
@c.push: Nil;
say @c;
# Returns:
[(Int) (Int)]
I guess that makes sense, considering the design of the language. A pushed Nil will be coerced to a placeholder for an Int.
But back-up a second: why are multi-element objects allowed to accept a single Type element at the head? In other languages there's a distinction between atomic/primitive, as opposed to aggregates of such atomic/primitive elements.
Not so in Perl/Raku?
The Int constraint accepts both type objects as well as instances. Perhaps you meant:
$ raku -e 'my Int:D @b; @b.push: Int'
Type check failed for an element of @b; expected Int:D but got Int (Int) (perhaps Nil was assigned to a :D which had no default?)
Putting the value Nil in a container, will assume the default value for that container. Which in the case of my Int @a would be Int. But it can also be more precise:
$ raku -e 'my Int:D @b is default(42); @b.push: Nil; dd @b'
Int:D @b = Array[Int:D].new(42)
@jubilatious1
why are multi-element objects allowed to accept a single Type element at the head? ... In other languages there's a distinction between atomic/primitive, as opposed to aggregates of such atomic/primitive elements.
Not so in Perl/Raku?
It's the same in Raku as it is in other languages.
As liz has explained, an Int is an Int but there's the distinction between "instance objects" and "type objects". Many PLs use precisely the same distinction using precisely the same terminology; Python, for example.
Interesting that strictly typed languages were mentioned but nobody thought about
assyntax as a possible option. It's an idea I've got a few minutes ago therefore it's very raw and poorly thought-out. But something like might be an option:foo [1,2] as Array[Int]; foo [1,2] as Int @; bar { 12 => 1.2, 43 => 4 } as Hash[Rat, Int]; bar { 12 => 1.2, 43 => 4 } as Rat %{Int};What's good about it is that it allows the compiler to create correct constant object at compile time, avoiding run-time overheads. For non-constant objects it would wind down to a run-time coercion case which is less interesting and only makes sense for providing DWIM behaivor.
Apparently, the syntax would also allow to avoid run-time coercion with any other, potentially constant, object.
Actually, even better than this, this can be done today, in a module.
multi sub infix:<as>(Positional \source, Positional:U \target) {
return source ~~ target ?? source !! target.new(source);
CATCH {
die "Cannot coerce {source.WHAT.^name} into {target.WHAT.^name} using 'as'.\n"
~ "Perhaps one element didn't match the element type?";
}
}
foo [1,2,3]; # Type check failed in binding to parameter '@a'; expected Positional[Int] but got Array ($[1, 2, 3])
foo [1,2,3] as Array[Int]; # works
foo [1,2,'c']; # Cannot coerce Array into Array[Int] using 'as'. Perhaps one element didn't match the element type?
I wrote that up in like two minutes. Obviously, more care would be needed for nested types like Array[Array[Int]] or for Associative types, but still, would not require a lot of work. This could be seen as a more "English like" way to do coercion, which has historically been a method call.