zenscript
zenscript copied to clipboard
Typeclass objects
Typeclass objects offer dynamic (i.e. run-time and open modularity1) polymorphism by binding the typeclass bound to the static (compile-time) type of the reference and tagging the data type instances that the reference can contain, so that the correct dictionary can be dispatched dynamically at run-time.
They are named "trait objects" in Rust and "existential quantification" (e.g. forall a
) or "existential typeclass" in Haskell.
1 Means we can't compile the universe, so we can't know all the data types required in dependency injection scenarios. Dynamic polymorphism is not just a run-time artifact, but it is also the fact of the real-world where total orders can't exist (i.e. the Second Law of Thermodynamics, Halting Problem, Incompleteness Theorem, Russell's Paradox, Byzantine Generals Problem, Byzantine fault-tolerance with CAP theorem, impossibility of existence without a quantified speed-of-light, etc). As well, there are no total orders of trustless, decentralization either.
Expression Problem
They have the benefit w.r.t. to Wadler's Expression Problem of delaying the binding of interface to data on construction of the typeclass object as opposed to OOP subclassing (~and Scheme/JavaScript prototype2~) which binds the interface to all instances of the subclassed type (on instantiation of the data). So they solve the Expression Problem up the point of instantiation of the typeclass object, but beyond that point while they allow adding new data types (via the aforementioned dynamic dispatch) they don't allow extending new operations after the instantation of the typeclass object which binds the allowed typeclass interface(s). Typeclass objects enable heterogeneous collections with dynamic extension of the collection of data types without requiring an invariant collections data structure (i.e. they can be the element type of an array). Whereas, typeclass bounds on function definitions delay binding the interface until the function call site and thus solve the Expression Problem up to that point, but they don't enable heterogeneous collections with dynamic extension of the collection of data types except via unions. Unions can both enable heterogeneous collections with dynamic extension of the collection of data types and delay binding of the interface until the function call site, but they require an invariant collections data structure such as a list.
2 Note that prototype programming can add new operations (but forces all instances of a prototype to take those new interfaces) and add new data types, but it can't add different interfaces to instances of the same prototype. ~Whereas, type classing can differentiate.~ (Edit: correction)
Syntax and Pecularities
@shelby3 wrote:
Well obviously that is useless and not what I was referring to. I mean for example, I have a collection of typeclass objects and I want to sort or find them, so I need a relation.
A relation only applies to two objects of the same type.
Exactly. So thus
Eq<A, A>
is useless for what I wrote above.One answer is to provide a mapping from each object type to a partial order (or total order).
So akin to subsumption to a subclass, except open to delayed definition. That might work for
sort
.Or another alternative is
Eq<A, B>
accepts any typeB
, but always returnsfalse
ifB
is notA
. Sofind
will work.Okay I see how maybe typeclass objects are still useful. Thanks.
@shelby3 wrote:
@keean wrote:
data Variant = for<A, B> Variant(A) requires A : Eq<B>
data Variant = Variant(A | Any) where A: Eq<Any>
That seems to be a general solution and I like it very much, unless someone can point out flaws.
So typeclass objects are always conceptually with | Any
, e.g.:
let x: A where Show<A | Any> = MyType(1, 2)
let y: A where Eq<A | Any, Any> = MyType(1, 2)
Or in my (and Rust's) preferred OOPish style:
let x: A where (A | Any): Show = MyType(1, 2)
let y: A where (A | Any): Eq<Any> = MyType(1, 2)
But I prefer we can write that shorthand (and so we only need where
clauses on function types and definitions):
let x: Show = MyType(1, 2)
let y: Eq<Any> = MyType(1, 2)
Hope you are also starting to see another reason why I favored an OOPish style because it is also more consistent with typeclass objects (and I actually did intuitively anticipate/expect this but not with complete solution worked out).
P.S. my understanding is that any function defined for a typeclass bound would also accept typeclass objects with that bound. The compiler would create a dynamic dispatch version of the said function on demand to be used when ever a reference to it is passed around. Seems to me that only functions called nominally can be monomorphized.
@keean wrote:
This callback can only show the type
A
which is passed tof
, the requirements should be at the top level.
f(callback: for<B> (B) => ()) where B: Show
Where the callback can be called on any type. The first example the callback is monomorphic with respect to
f
, in the second polymorphic.
I am instead proposing that be written the same as any typeclass bounded function without any for<B>
:
f(callback: (B) => ()) where B: Show
I wrote in the OP:
P.S. my understanding is that any function defined for a typeclass bound would also accept typeclass objects with that bound. The compiler would create a dynamic dispatch version of the said function on demand to be used when ever a reference to it is passed around. Seems to me that only functions called nominally can be monomorphized.
This:
f(callback: (B) => ()) where B: Show
Doesn't work because you are not specifying the scope for 'B' you could mean:
for<B> f(callback: (B) => ()) where B: Show
Or you could mean:
f(callback: for<B> (B) where B: Show => ())
@keean wrote:
Doesn't work because you are not specifying the scope for
B
you could mean:
I don't understand why that is important. Please explain.
I don't understand why that is important. Please explain.
- With the
for<B>
in the outer scope the callback is monomorphic with respect tof
so the callback can only be called on one type, that is fixed at the timef
is called. - With the for<B> in the inner score we are passing a polymorphic function that can be called on any
B
@keean wrote:
- With the
for<B>
in the outer scope the callback is monomorphic with respect tof
so the callback can only be called on one type, that is fixed at the time f is called.
Okay what you mean is that when f
is called, then at the call site the types of the callback are monomorphized. This means f
is only calling the callback with the said typeclass bounded instances and not with concrete data instances.
I don't think the for<B>
is necessary. The scope can be determined by whether the where
clause appears at the top-level scope or not! So non-top-level indicates a typeclass object; whereas, top-level indicates monomorphization at the call-site of f
. So now, you provided another more important reason that we need non-top-level where
clauses. I hate that ugly/noisy for<B>
.
- With the for in the inner score we are passing a polymorphic function that can be called on any
B
Right that is what I wrote else where, that it doesn't make any sense for f
to request a typeclass bounded callback (which isn't monomorphized at the call site for f
) unless it is for dynamic dispatch typeclass object(s).
Yes but I would rather make the programmer declare a datatype for a polymorphic callback function. This keeps the function syntax simpler, and is called first-class-polymorphism.
@keean wrote in other thread:
I think its a bit more of a problem. Its not the typeclass object that is needed. Consider:
f(callback : MyCallbackFn) where ...
We want to create the where clause when declaring
f
but we don't yet know the callback function that will be passed. We don't know what typeclass bounds the callback will have.If we want the callback to print it may need a
Show
constraint, we cannot know at the pointf
is declared what typeclass bounds callback will have, and to define them would be to restrict what the callback is allowed to do, which does not seem right.
I am pleased you raised this point.
First re-read my prior comment and click the link in it to remember what you pointed out to me about the scoping of the callback.
If the callback is scoped by the call site to function f
, then the caller chooses the typeclass bound(s), which satisfies your point.
And if the callback is scoped by the function f
, then f
decides the typeclass bound(s) of the typeclass object(s) of the callback.
What needs to happen is we need to infer the typeclass bounds on
f
whenf
is called with a specific callback which has known typeclass bounds.I think the callback example needs type-class bounds inference at the callsite of
f
in order to be useful.
If you are not referring to augmentation of type-class bounds, that would only be in the case when the callback is scoped by the call-site of f
, except it is generally impossible. Let's clarify what kind of inference we are taking about.
The function f
may call some operations on the callback and/or it may store f
in a data structure imperatively and thus we have no way of knowing which future operations on the callback will be allowed if f
does not declare them. The compiler will force f
to declare them. You could I suppose infer the operations that f
internally requires in its function body using principal typings, but we don't know yet to what extent we can support principal typings, as generally they are undecidable in an open system.
So f
probably must declare its typeclass bound(s) requirements (at least we decided at a minimum, it must be explicit at module boundaries).
If we want the callback to print it may need a
Show
constraint, we cannot know at the pointf
is declared what typeclass bounds callback will have, and to define them would be to restrict what the callback is allowed to do, which does not seem right.
But you are making a different point. You are talking about the caller of f
being able to augment (not remove the typeclass bounds f
requires) the available typeclass bounds that the callback requires in the case where the callback is scoped at the call-site of f
. In this case, I agree with you, since you want to delay unification of the type variables until the call-site.
So what you are saying is that f
can require operations on the type variables of the callback and the caller can add some more required operations.
The caller can add more where
requirements (either inferred or explicit) at the call site of f
. The way the explicit will be done at call site is with a cast either top-level:
f(callback): (MyCallback) => () [B: Print+Show]
Or non-top-level:
f(callback: (B) => () [B: Print+Show])
Note above that either will apply to the typeclass object or monomorphization cases, because per my proposal, f
's scoping of its where
clause determines which case it is.
@keean wrote:
Yes but I would rather make the programmer declare a datatype for a polymorphic callback function. This keeps the function syntax simpler, and is called first-class-polymorphism.
But that forces structural typing otherwise we can't reuse functions which have the same signature but different data
type names. That is why I had suggested type
aliases instead.
And type
aliases must be expandable to any general types, so we won't be able to force (make) the programmer use type
aliases for non-top-level where
clauses (as they can use the general types any where a type signature is required).
Edit: more discussion.
@keean wrote:
@shelby3 wrote:
@keean wrote:
@shelby3 wrote:
@keean wrote:
By using first class polymorphism we can assume all function signatures are monomorphic which avoids the need to keep track of the nesting where type variables are brought into scope
Can you please provide an example or more detailed explanation so I can grok this? (Also for readers' benefit)
, and allows us to standardise on a single top level
where
clause for all datatypes and function definitions. We can still construct arbitrary higher ranked types by using datatypes.See this paper, I think it explains it quite well. http://web.cecs.pdx.edu/~mpj/pubs/fcp.html
@keean wrote:
However I think we should use first class polymorphism so we don't have such complex function signatures. With first class polymorphism we can declare a data-type:
data MyCallback = (B) => () requires B : Show f(callback : MyCallback)
Please show me example syntax for instantiating an instance of
MyCallback
?Shouldn't that be instead:
data MyCallback = MyCallback((B) => () requires B : Show)
Yes, you are right, we require the constructor for the type to have
()
brackets. Basically it would be the normal monomorphic function declaration inside a datatype, and any type variable that is free on the right hand side gets automatically universally quantified.As the paper says, it keeps the syntax simpler, is easier to implement, but has the same expressive power as free annotation of functions. I think it will make for cleaner APIs too, as you can use the named type to give check you have the right type for your callback (so less typing when using libraries).
Okay so the research paper you cited is all about achieving type inference.
So as I wrote in my prior comment in this thread, you are advocating abandoning the ability to reuse functions (which contain a type variable) based on their structural type and instead forcing said functions to be instantiated nominally (thus making equivalent structural function types non-composable because they have different nominal constructors), all because you want to be able use principal typings on these said functions. I've already warned and feared that you are going to making your principal typings goal into a weapon to defeat degrees-of-freedom of expression. @skaller has also warned similarly that type inference is the enemy of expressiveness.
I don't think it is often going to be the case that we want to have a function taking a callback, where both said function and the callback function are both not annotated with quantified (aka concrete) types. And for those cases where might have wanted to do that, we won't. We will have to annotate. For myself and also for readers, please provide me a convincing example showing this annotation overhead (in this specific case of type variables we are discussing) is a significant burden justifying abandoning the ability to compose functions generally?
Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.
I don't think I like complex code without type annotations. Raises cognitive load due to lack of explicit local reasoning. Type inference is useful where the reader can easily figure out in their head what the type annotations would have been. This is one of my big complaints that makes Haskell code so obtuse to read IMO.
I don't think it is often going to be the case that we want to have a function taking a callback, where both said function and the callback function are both not annotated with quantified (aka concrete) types. And for those cases where might have wanted to do that, we won't. We will have to annotate. For myself and also for readers, please provide me a convincing example showing this annotation overhead (in this specific case of type variables we are discussing) is a significant burden justifying abandoning the ability to compose functions generally?
There is no loss in expressive power (it even says so in the paper), and it will make the type signatures shorter and more readable.
There is no loss in composition, and I am not sure why you would suggest there is? Function type signatures don't compose, unless you are talking about copy-and-paste composition? Copy-and-paste just leads to stupid errors. Why bother with named functions at all, why not just copy and paste the raw-code? The same argument applies to types, it is better to have a named type you can re-use (like you call functions) rather than have people copy-and-paste huge complex and unintelligible type signatures.
Okay so the research paper you cited is all about achieving type inference.
I disagree, it is all about avoiding having to have higher ranked type annotations. In Haskell (and all other languages I know of except those that use FCP) you must annotate higher ranked types because they cannot be inferred. Haskell cannot infer them, no language I know of can infer them. So this is not about inference. This is about how you achieve the annotation. We already need first-class-polymorphism to correctly handle type-class dictionaries, so we already will have the mechanism for it.
We can keep the language simpler and keep the full expressive power, how could anyone not want that?
@keean wrote:
There is no loss in expressive power (it even says so in the paper),
I guess that is correct. I think I also used the term 'composition' and so it is really boilerplate that is the issue.
and it will make the type signatures shorter and more readable.
At the cost of causing other boilerplate...
There is no loss in composition, and I am not sure why you would suggest there is?
I can't reuse a MyCallback
every where I can use a MyCallback2
(and vice versa) even if the signature of the functions are otherwise identical.
Granted I can destructure those constructors and re-construct a compatible instance, but this is boilerplate and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one. Afaics, FCP is increasing boilerplate of composition in exchange for the hope of not having to write type annotations for higher-ranked types (and I say hope because I don't know if FCP inference will integrate with our unions and other features and I don't know if we won't end up needing higher-ranked types any way).
I disagree, it is all about avoiding having to have higher ranked type annotations.
Which is precisely all about achieving (a similar level of expressiveness with) type inference. But the entire point of type inference is to not have boilerplate. I hope you didn't really expect that FCP is some Holy Grail without trade-offs.
So this is not about inference. This is about how you achieve the annotation.
It is about how we support type variables with minimal annotation and boilerplate, with the goal of enabling type inference with a boilerplate wrapping versus losing type inference with no boilerplate wrapping.
We can keep the language simpler and keep the full expressive power, how could anyone not want that?
How is it simpler? You trade off one thing for another.
The same argument applies to types, it is better to have a named type you can re-use (like you call functions) rather than have people copy-and-paste huge complex and unintelligible type signatures.
The difference is we don't normally expect structurally equivalent types to be equivalent (excluding our planned unions and intersections). But functions we normally do expect them to be composable on structurally equivalence (which is why we allow anonymous functions).
Note you did not address my comment here:
@shelby3 wrote:
Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.
How is it simpler? You trade off one thing for another.
You avoid nested where clauses in the syntax. The grammar will be simpler, (and this will be visible in the simple EBNF for the language).
The difference is we don't normally expect structurally equivalent types to be equivalent. But functions we normally do expect them to be composable on structurally equivalence.
We dont normally expect any equivalence operator to work on functions, so I am not sure this is important. Remember top-level functions can be fully polymorphic, so you only need to wrap the argument in the datatype at the call site. Like this:
data MyCallback = MyCallback((A) => A)
id(x : A) : A => x
fn(callback : MyCallback)
fn(MyCallback(id))
What is wrong with type annotations?
They’re great! They add to the ability to reason about code in principle. They allow error messages to make sense.
The downside of annotation is verbosity which makes code harder to read. The programmer should have flexibility chosing the tradeoff.
— john skaller [email protected] http://felix-lang.org
@keean wrote:
We dont normally expect any equivalence operator to work on functions, so I am not sure this is important.
Disagree. We expect assignment of functions to work on structural equivalence.
fn(MyCallback(id))
And so you are right back to type annotations again.
How is it simpler? You trade off one thing for another.
You avoid nested
where
clauses in the syntax. The grammar will be simpler, (and this will be visible in the simple EBNF for the language).
We avoid that but add the boilerplate back in other places. So isn't necessarily a net gain in brevity. It also loses some local reasoning.
The nested where
clauses can be avoided without forcing FCP, by offering a type
alias as I mentioned twice before.
And you didn't address my comment:
@shelby3 wrote:
and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one
Generally wrapping interacts badly because it becomes deeply nested!!!
Which is why we want a first-class anonymous union instead of massive case-tree logic to destructure Either
. OTOH, first-class without wrapping makes inference undecidable.
Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.
Because it adds complexity to the implementation and the grammar, and complexity is the enemy of good software. Expressive power is achieving the most with the least.
What is wrong with type annotations?
Nothing is wrong with writing types, but the more complex the nesting of types, especially higher-ranked types is confusing. If we are aiming to make this easy for people coming from OO to pick up then we will want to make the type signatures as easy to understand as possible.
As I said above, this has nothing to do with inference, you have to write the types in either case, but the main difference is that with FCP all the quantifiers are implicit and all the where clauses are at the top level, simplifying the syntax, for no loss in expressive power. It makes the code cleaner and easier to read.
and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one
No a single record or typeclass object encapsulates all the polymorphic functions so you do not need this.
For example:
data MyCallback = MyCallback {
fna (A -> A) : A
fnb (A -> A -> A) : A
}
Works exactly the same as the single argument tuple datatype posted before.
Generally wrapping interacts badly, which is why we want a first-class anonymous union instead of massive case-tree logic to destructure Either. OTOH, first-class without wrapping makes inference undecidable.
It has nothing to do with inference as you have to provide the type signature in either case, it is just where and how that signature is written that changes. FCP uses the same mechanism as type-class dictionaries.
We avoid that but add the boilerplate back in other places. So isn't necessarily a net gain in brevity. It also loses some local reasoning.
Callbacks are not local in any case, as the callback could happen from anywhere. Consider async-IO, you call write
with a callback, and flow immediately continues with the next statement. The callback will be called by some non-local code in the IO complete event handler. In JavaScript this can only happen after the current "program" has finished returning control to the main event-loop.
Readers to understand what we are discussing now, I think it is helpful to understand this way of emulating higher-ranked types which I was introduced to me at the Rust language forum:
trait Foo {
fn call<A: Read>(&self, a: A);
}
fn f<G: Foo>(g: G, pred: bool) {
if pred {
g.call(File::open("foo").unwrap());
} else {
g.call(Cursor::new(b"foo"));
}
}
By giving a name Foo
to the type constructor for the function call
, we are able to delay quantification of type variable <A: Read>
until the body of the function f
. Compare that to the example which failed to compile:
fn f<A: Read, G: Fn(A) -> ()>(g: G, pred: bool) {
if pred {
g(File::open("foo").unwrap());
} else {
g(Cursor::new(b"foo"));
}
}
The above won't compile without higher-ranked types (which Rust doesn't support), because the quantification of <A: Read>
is at the call-site for function f
. So even if we did not place the : Read
bound on A
, the inference engine could not infer the correct type for A
, which happens to be an intersection of two quantified functions where A
is a File
or a Cursor
.
In general, inference of function application over unions and intersections diverges into infinite recursion in the inference and is undecidable. @keean showed me an example of that using expansion variables in private discussions last May. The two higher-order features are like mirrors facing each other, thus enabling mutual recursion.
Generally wrapping interacts badly because it becomes deeply nested!!!
It is still better than the alternative of a huge multi-line type signature all in one block.
@shelby3 looking at the direction you want to go, we may be better off ditching quantified types and type-classes altogether and going for a full intersection and union type system with ad-hoc overloading?
@keean I haven't made any decision. I am trying to analyze the trade-offs.
@keean wrote:
Generally wrapping interacts badly because it becomes deeply nested!!!
It is still better than the alternative of a huge multi-line type signature all in one block.
Disagree! The deep nesting totally breaks any reasonable composition.
Also we don't need to force FCP in order to get type aliases:
@shelby3 wrote:
The nested
where
clauses can be avoided without forcing FCP, by offering atype
alias as I mentioned twice before.
@shelby3 I feel you are making statements about a style you have no experience of coding, so I am not sure how much weight to give your objections.
I haven't made any decision. I am trying to analyze the trade-offs.
Intersection types allow functions to be like a type-class where a single function can have different signature that apply to different types. You no longer need the boilerplate of declaring a type-class you just define two functions with different type signatures that have the same name.
@keean wrote:
and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one
No a single record or typeclass object encapsulates all the polymorphic functions so you do not need this.
You misunderstood. The context of my point was MyCallback
and MyCallback2
both having the same structural type, but requiring separate implementations for a typeclass bound, because they have different (nominal) types.
And then factor in deep nesting and you are in implementation proliferation hell.
The context of my point was MyCallback and MyCallback2 both having the same structural type
Why would you declare two of them then? Thats like saying add
and add2
both have different names despite both adding numbers exactly the same way. You don't need someone to stop you defining the add function more than once, why would you need someone to stop you defining "MyCallback" more than once. Its something you just wouldn't do.
As I said above, this has nothing to do with inference, you have to write the types in either case, but the main difference is that with FCP all the quantifiers are implicit and all the where clauses are at the top level, simplifying the syntax, for no loss in expressive power. It makes the code cleaner and easier to read.
Implicit quantifiers do NOT make code easier to read. Its like most “simplifications”, they seem like a good idea at the time, they work well for simple cases, but its a disaster precisely in the cases where you think it would serve best, the complex cases.
Its a known fact that human information transmission requires redundancy. Perhaps less for smarter people, or people familiar with the domain, more for others. But for language, psychologists have actually MEASURED the amount of redundancy that gives the best comprehension, its somewhere between 15 and 30% I believe (in linear text).
Computer programming languages may require different redundancy. but infering types everywhere clearly defeats the whole point of having type annotations: to provide redundant information for validation and confirmation.
— john skaller [email protected] http://felix-lang.org
@keean wrote:
The context of my point was MyCallback and MyCallback2 both having the same structural type
Why would you declare two of them then? Thats like saying add and add2 both have different names despite both adding numbers exactly the same way.
You have this big blind spot that repeats over and over even though I've mentioned to you 5 or more times that total orders don't exist in our universe.
In open modularity, you can't control the proliferation of duplication. It is impossible to maintain tight control over anything in a decentralized context.