crystal
crystal copied to clipboard
[RFC] Don't unify types under the same hierarchy as a base virtual type
Given:
class Foo
end
class Bar < Foo
end
class Baz < Foo
end
We have:
var = rand < 0.5 ? Bar.new : Baz.new
typeof(var) #=> Foo
That is, the type of the variable becomes Foo
. Internally the compiler sees this as Foo+
, which means "Foo
or any of its subtypes".
Maybe the compiler shouldn't do this. Before 0.16.0 this was kind of needed because otherwise instance variables would end up with huge unions. Now one has to specify their type, and so one could choose to do either of these, depending on the use case:
# Foo or any of its subclasses
@foo : Foo
# I know foo can only be Bar or Baz (or any of its subclasses), but not other subclasses of Foo
@foo : Bar | Baz
This needs to be tested in the compiler to see if there's nothing obvious that would break, so this RFC is actually more a "needs to try this and see what happens" and then take a decision. Big union types could still appear for local variables, and there's also the thing that [Bar.new, Baz.new]
would be typed as Array(Bar | Baz)
instead of Array(Foo)
, where one might expect that to be typed as the later.
100+ for precise type unions :-)
#triage This seems to have changed since the RFC was started.
The output of the original scenario was never typeof(var) #=> Foo
as far as I can tell - I went as far back as Crystal 0.7.1 https://play.crystal-lang.org/#/r/131y (current: https://play.crystal-lang.org/#/r/131s)
Is the original description unclear as to what typeof
is meant to return?
@miketheman there was a missing .new
call, I fixed it, thanks!
I tried to do this in this branch. It makes compiling the compiler 2~3 slower and also consumes more memory. The main problem is that the parser has many methods that create some AST nodes, and when combined lead to many different small unions where previously there were all of types ASTNode+
.
This is actually the reason why we chose to unify types under the same hierarchy, so I guess this won't be implemented soon (maybe never).
I think we should smash our heads together and come up with the sensible solution (TM) here ;-) (I think this is the solution @asterite - we just need to make it faster. Kudos!)
:-D
One idea is to combine types under the same hierarchy if a union type has more than N elements in it. But adding an heuristic to the language might not be a good idea. Or maybe give a compile error if such union appears, where all types are in the same hierarchy, forcing you to add some annotations to level-up the types. So in that case the parser would have some annotations to generate ASTNodes instead of these small unions... still not sure about this.
I was thinking the exact same thing - and the same critic :-D For me, once again, we touch upon the warnings-kinda-bastard-thing. Which... is dirty in a way. But finally we end up there whatever we do - iff we want to do good expressible shit, with the power to do what is needed, at you fingertips. Either it's gonna be a '--warnings' or defaulted unless '--warnings-suppressed-muddafxx'. In the real world we need guidance to do the right thing (TM...) - or maybe it's just me.
@asterite, do you know off the bat where the majority of time is spent? Is it because of codegen details (long cond branches for picking method-instance)? Or having to ensure methods exists for each type at each call site in semantics, instead of just checking for the virtual type? Or something else? In any case, I'll try and dig around and see if I can help in any way - I do believe exact unions is the cleanest way of handling types (for the user).
The problem is that a method is instantiated for each union. For example:
def node_location(node)
puts node.location
end
Now, if node
is Var | NilLiteral
a method node_location(node : Var | NilLiteral)
is instantiated. If node
is Var | Call
, a new method is instantiated for that. Now imagine node_location
passes node
around, a lot of duplicate methods will be typed and generated... and they all do the same. And in the compiler a lot of these small unions were created, and with a type hierarchy as big as ASTNode
it turns out to be a combinatorial explosion.
I didn't time this or profile this, but this is what my intuition tells me.
Alright. On a side note: does this mean the functions become even more specific in their multiple dispatching, with potentially less branching? I'm gonna try and wrap my head around it.
My suggestion: if the developper specifies the type of something, there should be no virtual type of the type hierarchy.
class Parent; end
class C1 < Parent; end
class C2 < Parent; end
class C3 < Parent; end
# This makes sense, we can accept the compiler being more general
a = [C1.new, C2.new]
puts typeof(a) #=> Array(Parent)
# I'm being very specific here, I could put Parent if that's what I wanted, there should be no unification
b = [] of C1 | C2
puts typeof(b) #=> Array(Parent) # Should be Array(C1 | C2)
# This is wrong! I specifically choose C1 | C2
b << C3.new #=> works
Normally, the programmer won't specify 100 types. So when the compiler is infering stuff itself, it can still optimize with the unification, but when the developper gives an exact order, the compiler has to follow them.
This causes silent breakage in type-based when clauses. See #4303.
Is this going to remain with the current behaviour? Because I really think this is wrong.
I think that it's wrong too. Voting for correct behaviour!
Another example. I have a Message
class, which may have some attributes. This is supposed to work:
def perform(telegram_file : ::Tele::Types::Audio | ::Tele::Types::Document | ::Tele::Types::Voice)
end
if file = message.audio || message.voice || message.document
perform(file)
end
But it crashes with this goddamn error: no overload matches '#perform' with types Tele::Type+
I even don't know how to overcome this bug :cry:
This is stupid! file.as(Tele::Types::Audio | Tele::Types::Voice | Tele::Types::Document).file_id
magically becomes... undefined method 'file_id' for Tele::Types::User (compile-time type is Tele::Type+)
!
Maybe the compiler could stick to the current behaviour by default and summarizes subclasses as a Parent+ type for efficiency. But when it encounters a missing method on such an inferred type, it should be able to switch to a more exact resolution and try to use the actual union type for this.
@asterite Would you consider reopening this? It seems this issue has some serious implications and it would be nice to keep the discussion alive and perhaps find improvements.
@straight-shoota No. Compile times will dramatically increase with this change, and they are already super high.
I'm also not developing the compiler anymore, I'll just comment on issues that I find interesting/fun, check some PRs and send some too, as long as I find them fun for me. I might send a few simple features to the compiler if I find the time, and if I find it more or less easy to do, but that's it 😊
Yeah, I was just addressing you because you were the one who closed this ;)
I'm not suggesting to follow the proposed implementation but I find it worth to keep a reminder and thinking about this might perhaps lead to different ways to improve code like the examples provided by @vladfaust.
I even don't know how to overcome this bug
You can add something like Performable
to type heirarchy - a parent class for every Type
that can be performed
. Of course it doesn't works if there are many places each limited to a different subset of types.
Another idea is to don't inherit types from Type
- it doesn't seems having any fields. Instead, include it as a module.
The compiler should probably give an error when writing a union type like B | C
where that would lead to A+
and say "Unsupported union: B | C. Use A".
I'll reopen this issue if you want, maybe someone will fix it.
As for the example above, just don't put a type restriction, it's not necessary.
Thank a lot, @asterite ♥️
@asterite, What is now with this error? #6024 is merged, but #4303 still exist.
@alsoijw Sorry, I don't understand what you mean
abstract class RootClass; end
class Class1 < RootClass; end
class Class2 < RootClass; end
class Class3 < RootClass; end
o = Class3.new
case o
when Class1 | Class2
puts "o is Class3, this shouldn't match"
when Class3
puts "ok"
end
@asterite, crystal 0.25 choose second condition, but in newest 1.0 it choose first condition. Are there some reason for changing this behaviour, or it's regression?
@alsoijw Someone else can probably confirm this, but I'm pretty sure that the union Class1 | Class2
is being converted into like RootClass+
, which would match all children, hence why it matches the first condition. If you want to have a single condition handle both, you should do when Class1, Class2
which would work as you're thinking.
Yes, that's exactly it. There's an issue tracked for that already but I can't find it 😞
@alsoijw Someone else can probably confirm this, but I'm pretty sure that the union Class1 | Class2 is being converted into like RootClass+
@Blacksmoke16, that's right.
If you want to have a single condition handle both, you should do when Class1, Class2 which would work as you're thinking.
This does not solve all possible problems.
abstract class RootClass; end
class Class1 < RootClass; end
class Class2 < RootClass; end
class Class3 < RootClass; end
class Example
@val : Nil | Class1 | Class2
property val
end
o = Class3.new
e = Example.new
e.val = o
Crystal sees this:
@val : Nil | Class1 | Class2
as this:
@val : Nil | RootClass
It's the same issue. Whenever you write A | B
and they both inherit a parent type T
, it's the same as writing T
.
I actually fixes this particular issue in https://github.com/crystal-lang/crystal/pull/9052 but it was eventually rejected. I'd still like to change the language to behave more correctly when there are type annotations, we'll see if others change their mind.