crystal [RFC] Don't unify types under the same hierarchy as a base virtual type

Given:

class Foo
end

class Bar < Foo
end

class Baz < Foo
end

We have:

var = rand < 0.5 ? Bar.new : Baz.new
typeof(var) #=> Foo

That is, the type of the variable becomes Foo. Internally the compiler sees this as Foo+, which means "Foo or any of its subtypes".

Maybe the compiler shouldn't do this. Before 0.16.0 this was kind of needed because otherwise instance variables would end up with huge unions. Now one has to specify their type, and so one could choose to do either of these, depending on the use case:

# Foo or any of its subclasses
@foo : Foo

# I know foo can only be Bar or Baz (or any of its subclasses), but not other subclasses of Foo
@foo : Bar | Baz

This needs to be tested in the compiler to see if there's nothing obvious that would break, so this RFC is actually more a "needs to try this and see what happens" and then take a decision. Big union types could still appear for local variables, and there's also the thing that [Bar.new, Baz.new] would be typed as Array(Bar | Baz) instead of Array(Foo), where one might expect that to be typed as the later.

May 27 '16 15:05 asterite

100+ for precise type unions :-)

Jun 02 '16 19:06 ozra

#triage This seems to have changed since the RFC was started.

The output of the original scenario was never typeof(var) #=> Foo as far as I can tell - I went as far back as Crystal 0.7.1 https://play.crystal-lang.org/#/r/131y (current: https://play.crystal-lang.org/#/r/131s)

Is the original description unclear as to what typeof is meant to return?

Jul 04 '16 18:07 miketheman

@miketheman there was a missing .new call, I fixed it, thanks!

Jul 04 '16 18:07 asterite

I tried to do this in this branch. It makes compiling the compiler 2~3 slower and also consumes more memory. The main problem is that the parser has many methods that create some AST nodes, and when combined lead to many different small unions where previously there were all of types ASTNode+.

This is actually the reason why we chose to unify types under the same hierarchy, so I guess this won't be implemented soon (maybe never).

Aug 05 '16 22:08 asterite

I think we should smash our heads together and come up with the sensible solution (TM) here ;-) (I think this is the solution @asterite - we just need to make it faster. Kudos!)

Aug 08 '16 19:08 ozra

:-D

One idea is to combine types under the same hierarchy if a union type has more than N elements in it. But adding an heuristic to the language might not be a good idea. Or maybe give a compile error if such union appears, where all types are in the same hierarchy, forcing you to add some annotations to level-up the types. So in that case the parser would have some annotations to generate ASTNodes instead of these small unions... still not sure about this.

Aug 08 '16 19:08 asterite

I was thinking the exact same thing - and the same critic :-D For me, once again, we touch upon the warnings-kinda-bastard-thing. Which... is dirty in a way. But finally we end up there whatever we do - iff we want to do good expressible shit, with the power to do what is needed, at you fingertips. Either it's gonna be a '--warnings' or defaulted unless '--warnings-suppressed-muddafxx'. In the real world we need guidance to do the right thing (TM...) - or maybe it's just me.

Aug 08 '16 19:08 ozra

@asterite, do you know off the bat where the majority of time is spent? Is it because of codegen details (long cond branches for picking method-instance)? Or having to ensure methods exists for each type at each call site in semantics, instead of just checking for the virtual type? Or something else? In any case, I'll try and dig around and see if I can help in any way - I do believe exact unions is the cleanest way of handling types (for the user).

Aug 11 '16 19:08 ozra

The problem is that a method is instantiated for each union. For example:

def node_location(node)
  puts node.location
end

Now, if node is Var | NilLiteral a method node_location(node : Var | NilLiteral) is instantiated. If node is Var | Call, a new method is instantiated for that. Now imagine node_location passes node around, a lot of duplicate methods will be typed and generated... and they all do the same. And in the compiler a lot of these small unions were created, and with a type hierarchy as big as ASTNode it turns out to be a combinatorial explosion.

I didn't time this or profile this, but this is what my intuition tells me.

Aug 11 '16 21:08 asterite

Alright. On a side note: does this mean the functions become even more specific in their multiple dispatching, with potentially less branching? I'm gonna try and wrap my head around it.

Aug 13 '16 05:08 ozra

My suggestion: if the developper specifies the type of something, there should be no virtual type of the type hierarchy.

class Parent; end
class C1 < Parent; end
class C2 < Parent; end
class C3 < Parent; end

# This makes sense, we can accept the compiler being more general
a = [C1.new, C2.new]
puts typeof(a) #=> Array(Parent)

# I'm being very specific here, I could put Parent if that's what I wanted, there should be no unification
b = [] of C1 | C2
puts typeof(b) #=> Array(Parent) # Should be Array(C1 | C2)
# This is wrong! I specifically choose C1 | C2
b << C3.new #=> works

Normally, the programmer won't specify 100 types. So when the compiler is infering stuff itself, it can still optimize with the unification, but when the developper gives an exact order, the compiler has to follow them.

Jan 10 '17 01:01 MaxLap

This causes silent breakage in type-based when clauses. See #4303.

May 20 '17 16:05 davidgiven

Is this going to remain with the current behaviour? Because I really think this is wrong.

May 24 '17 18:05 davidgiven

I think that it's wrong too. Voting for correct behaviour!

Aug 07 '17 14:08 vladfaust

Another example. I have a Message class, which may have some attributes. This is supposed to work:

def perform(telegram_file : ::Tele::Types::Audio | ::Tele::Types::Document | ::Tele::Types::Voice)
end

if file = message.audio || message.voice || message.document
  perform(file)
end

But it crashes with this goddamn error: no overload matches '#perform' with types Tele::Type+

I even don't know how to overcome this bug :cry:

Aug 17 '17 11:08 vladfaust

This is stupid! file.as(Tele::Types::Audio | Tele::Types::Voice | Tele::Types::Document).file_id magically becomes... undefined method 'file_id' for Tele::Types::User (compile-time type is Tele::Type+)!

Aug 17 '17 22:08 vladfaust

Maybe the compiler could stick to the current behaviour by default and summarizes subclasses as a Parent+ type for efficiency. But when it encounters a missing method on such an inferred type, it should be able to switch to a more exact resolution and try to use the actual union type for this.

Aug 18 '17 07:08 straight-shoota

@asterite Would you consider reopening this? It seems this issue has some serious implications and it would be nice to keep the discussion alive and perhaps find improvements.

Aug 18 '17 08:08 straight-shoota

@straight-shoota No. Compile times will dramatically increase with this change, and they are already super high.

I'm also not developing the compiler anymore, I'll just comment on issues that I find interesting/fun, check some PRs and send some too, as long as I find them fun for me. I might send a few simple features to the compiler if I find the time, and if I find it more or less easy to do, but that's it 😊

Aug 18 '17 10:08 asterite

Yeah, I was just addressing you because you were the one who closed this ;)

I'm not suggesting to follow the proposed implementation but I find it worth to keep a reminder and thinking about this might perhaps lead to different ways to improve code like the examples provided by @vladfaust.

Aug 18 '17 10:08 straight-shoota

I even don't know how to overcome this bug

You can add something like Performable to type heirarchy - a parent class for every Type that can be performed. Of course it doesn't works if there are many places each limited to a different subset of types. Another idea is to don't inherit types from Type - it doesn't seems having any fields. Instead, include it as a module.

Aug 18 '17 11:08 konovod

The compiler should probably give an error when writing a union type like B | C where that would lead to A+ and say "Unsupported union: B | C. Use A".

I'll reopen this issue if you want, maybe someone will fix it.

As for the example above, just don't put a type restriction, it's not necessary.

Aug 18 '17 13:08 asterite

Thank a lot, @asterite ♥️

Apr 30 '18 09:04 vladfaust

@asterite, What is now with this error? #6024 is merged, but #4303 still exist.

Jul 09 '21 15:07 alsoijw

@alsoijw Sorry, I don't understand what you mean

Jul 09 '21 15:07 asterite

abstract class RootClass; end
class Class1 < RootClass; end
class Class2 < RootClass; end
class Class3 < RootClass; end

o = Class3.new
case o
when Class1 | Class2
  puts "o is Class3, this shouldn't match"
when Class3
  puts "ok"
end

@asterite, crystal 0.25 choose second condition, but in newest 1.0 it choose first condition. Are there some reason for changing this behaviour, or it's regression?

Jul 09 '21 15:07 alsoijw

@alsoijw Someone else can probably confirm this, but I'm pretty sure that the union Class1 | Class2 is being converted into like RootClass+, which would match all children, hence why it matches the first condition. If you want to have a single condition handle both, you should do when Class1, Class2 which would work as you're thinking.

Jul 09 '21 16:07 Blacksmoke16

Yes, that's exactly it. There's an issue tracked for that already but I can't find it 😞

Jul 09 '21 16:07 asterite

@alsoijw Someone else can probably confirm this, but I'm pretty sure that the union Class1 | Class2 is being converted into like RootClass+

@Blacksmoke16, that's right.

If you want to have a single condition handle both, you should do when Class1, Class2 which would work as you're thinking.

This does not solve all possible problems.

abstract class RootClass; end
class Class1 < RootClass; end
class Class2 < RootClass; end
class Class3 < RootClass; end

class Example
  @val : Nil | Class1 | Class2

  property val
end
o = Class3.new
e = Example.new
e.val = o

Jul 09 '21 16:07 alsoijw

Crystal sees this:

@val : Nil | Class1 | Class2

as this:

@val : Nil | RootClass

It's the same issue. Whenever you write A | B and they both inherit a parent type T, it's the same as writing T.

I actually fixes this particular issue in https://github.com/crystal-lang/crystal/pull/9052 but it was eventually rejected. I'd still like to change the language to behave more correctly when there are type annotations, we'll see if others change their mind.

Jul 09 '21 16:07 asterite

crystal crystal copied to clipboard

[RFC] Don't unify types under the same hierarchy as a base virtual type

crystal
crystal copied to clipboard