Carp
Carp copied to clipboard
feat: Protocols
This PR adds initial support for Protocols, a sort of lightweight form of typeclasses. Given a collection of interfaces, protocols allow users to enforce type restrictions around the implementation of those interfaces. Before we get into the boring minutiae, it might be easier to see an example of how they work.
To define a protocol, use defprotocol
;; to define a new protocol, we use `defprotocol`
;; defprotocol takes a name and one or more interfaces
(defprotocol Num + -)
:i Num
=> Num : (Protocol Num)
Defined at line 1, column 14 in 'REPL'
Required Definitions: +, -
Once we have a protocol defined, we can mark types as members of it using instance
(instance Num Int)
:i Num
=> Num : (Protocol Num: Int)
Defined at line 1, column 14 in 'REPL'
Required Definitions: +, -
Of course, if the type doesn't implement the required interfaces, it won't be added to the protocol:
(instance Num Char)
=> The type Char does not implement the following interfaces: +, - which are required by the protocol. at Core Primitives:0:0.
Traceback:
(instance Num Char) at REPL:5:1.
We can reference protocols in types to limit the acceptable types in otherwise fully polymorphic positions. To refer to a protocol in a type signature, prefix it with !. For example:
(sig math (Fn [!Num !Num] !Num))
(defn math [x y] (- x (+ x y)))
Protocols leverage the same polymorphism resolution as our regular old variables, so nothing is resolved until we call this function.
Let's try it with a valid Num:
(math 2 3)
=> -3
and what about a type that doesn't implement the protocol, even though it implements the interfaces?
(math 3.0 2.0)
=> I can’t match the types `Double` and `(Protocol Num: Int)`. within `(math 3.0 2.0)`
3.0 : Double
At line 12, column 7 in 'REPL'
Expected first argument to 'math' : (Protocol Num: Int)
At line 12, column 2 in 'REPL' at REPL:12:1.
oops. Lucky for us, Double already implements our interfaces:
(instance Num Double)
(math 3.0 2.0)
=> -2
Finally, protocols can also be used for arbitrary "type tagging" or type unions simply by omitting any interfaces. One can use this to constrain the polymorphism of functions where desired without requiring additional interface implementations:
(defprotocol Foo)
(sig foo (Fn [!Foo] String))
(defn foo [_] @"foo")
(instance Foo Int)
:i Foo
=> Foo : (Protocol Foo: Int)
Defined at line 15, column 14 in 'REPL'
Required Definitions:
(foo 3)
=> "foo"
(foo \w)
=> I can’t match the types `Char` and `(Protocol Foo: Int)`. within `(foo \w)`
\w : Char
At line 22, column 6 in 'REPL'
Expected first argument to 'foo' : (Protocol Foo: Int)
At line 22, column 2 in 'REPL' at REPL:22:1.
Now that we know how it works, here are a couple of important implementation notes:
- Protocols have two representations, XObjs, and a corresponding Type. Only the XObj contains a list of the interfaces required, the type only contains a list of types known to be members of the protocol.
- The implementation of the protocols interfaces for a given type is checked at
instancecall time. If an implementation of a required interface doesn't yet exist in the environment wheninstanceis called, adding the type to the protocol will fail. - Interface implementations are checked by simple subtyping. One can't yet for instance, specify the required position of the type in the interface function implementation. Likewise, protocols are more like unions than typeclasses insofar as they stand in for a variable, rather than take one as an argument.
- Protocols are kept track of via the state of the type environment. When a new instance is added, the corresponding protocol is updated in the type env. Unfortunately this means parts of the code base that care about protocols need to call some functions
resolveProtocolsandupdateProtocols(depending on whether or not the have access to a full context or just the type environment) in order to ensure they have the latest version of the protocol's memberships. - They don't yet work for higher kinded types. I'll need to add support for this.
- Protocols are always added to the top level of the type environment.
- In type signatures protocols must always be differentiated using
!. They can share names with structs. When the!is absent, the type system will try to find a struct. Think of this as similar to haskell's=>delimiter for type classes vs types.
Very interesting PR! Some feedback:
Being able to constraint parameters is really nice and important. This should be enabled for regular old interfaces too though, or you'd have to define an interface and a protocol each time you want to constraint a parameter of a function. Maybe this should be solved with a "typeclass" macro that does it all in one go? Having to think about whether to use an interface or a protocol seems like it's making the language harder to use, and a bit less elegant. One way around that is to have the user only interact with the protocol construct and let interface be more of an implementation detail, if that makes sense?
Having to have two concepts could also be a sign that the current interface design is a bit too simplistic and the correct solution is to redesign them so that they actually support the things that this PR adds onto protocol (mainly constraints and grouping of functions). Keeping things separated has some appeal though, and there's probably some implementation insight that I'm missing.
My other concern is regarding the syntax and semantics of constraining type variables. First, I didn't catch why the ! is needed. Second, there seems to be no way to constraint a type variable that appears multiple times in a signature? (e.g. Num a => a -> a -> a in Haskell). Without that it seems hard to make use of the constraints in the implementation of the function since each parameter could belong to a different instance of the protocol.
Scott coming in with the hot features, I dig it 🏅
I’ll take a look at it over the weekend, I’m impressed by how little code this is!
Similarly to Erik, I was asking myself whether for the unification algorithm, !Num has to be unified to the same concrete type in t he context of a type definition (so like one variable) or whether it just has to be any type that inhabits the protocol (so like one variable per occurence of !Num). I think both of these would make sense, and both of these should exist. Maybe something like binding a protocol to a variable might therefore be necessary?
One additional thing: I think that constraining a single type variable to multiple interfaces/protocols is also an important feature. And if we have that, protocols is very similar to a shorthand for constraining the type variable to a set of interfaces. So that's something to consider for making this feature as simple as possible (conceptually)
Some great points about variable constraints. I was just thinking about this last night actually, and an alternative way to implement this is to do something like:
(defprotocol Stringable-Addable [a b] (str (Fn [a] String)) (add [a a] a))
(instance Stringable-Addable [Int] Int.str Int.+)
Which is kind of along the lines of what you mentioned, Erik, where the protocol and interfaces are kept separate while still allowing us greater control over polymorphism. One downside to this approach is that it kind of "reimplements" interfaces in some ways. One nice thing about the current approach is that you can just enforce subtyping based on pre-existing implementations of the interfaces you care about.
Another downside to this approach is that we'd need to call special protocol functions instead of interfaces in function bodies: e.g.
(sig foo (Fn [Stringable-Addable Stringable-Addable] (Stringable-Addable)))
(defn foo [x y] (Stringable-Addable.add))
So perhaps it really is best to align closer with Haskell for maximum control. At which point we'd instead have something like:
(defprotocol Stringable str)
(defprotocol Fooable)
(sig with-foo [(protocol Stringable Fooable a) ...] (Fn [a] String))
(defn with-foo [x] (concat (str x) "foo"))
One advantage to this approach is that it totally does away with the need for an instance statement, and it keeps our old implementation of highly flexible interfaces.
First, I didn't catch why the ! is needed.
This is an unfortunate side-effect of the way xobjToTy is implemented. By default it translates capitalized symbols to structs, so we need some way to differentiate those from protocols in type signatures, similar to how we check if the first letter is lowercase to cast to Var:
xobjToTy (XObj (Sym spath@(SymPath _ s@(firstLetter : rest)) _) _ _)
| isLower firstLetter = Just (VarTy s)
| firstLetter == '!' =
if (not (null rest))
then Just (ProtocolTy (SymPath [] rest) [])
else Nothing
First of all this is awesome, I've been wanting this in the language for a while.
(sig with-foo [(protocol Stringable Fooable a) ...] (Fn [a] String))
For what it's worth I like this syntax:
(sig my-fun [(where a Stringable Fooable)] (Fn [a] String)
I agree with Erik that having both interfaces and protocol might be a bit confusing, after all we could still keep the current interfaces short names by declaring generic functions:
(sig str [(where a Stringable)] (Fn [a] String)
(defn str [a] (Stringable.str a))
Nice, I like the where style syntax too and you make a great point about generics.
Before I go any further, maybe we can all agree on the best implementation, it sounds like maybe we want to do the following:
- replace
interfacewith a new formdefprotocolwhich might look like the following:
(defprotocol (<Name> <params>)
(<fn-name> <signature>)
)
;;e .g.
(defprotocol (Monoid a)
(combine (Fn [a a] a))
(zero a)
)
- add an
instanceform that designates implementations of a protocol for some type t
(instance (Monoid Int)
(combine [x x] (+ x x))
(zero 0)
)
- add a
whereform that allows users to constrain type variables to protocols:
(sig plus-zero (where (Monoid a)) (Fn [a] a))
(defn plus-zero [x] (Monoid.combine x (Monoid.zero)))
Or with multiple forms:
(defprotocol (Foo a))
(sig plus-zero (where (Monoid a) (Foo a)) (Fn [a] a))
And supporting multi-parameter protocols should probably be a thing too
(defprotocol (Natrual a b)
(transform (Fn [a] b))
)
(sig transform-then-plus (where (Natural a b) (Monoid b)) (Fn [a b] b))
And we should also support parameterizing over higher kinds application like haskell does too:
(defprotocol (Functor f)
(fmap (Fn [(Fn [a] b) (f a)] (f b)))
)
(instance (Functor Maybe)
(fmap [f m] (match m) (Nothing) (Nothing) (Just x) (Just (f x)))
)
(sig fmap-and-plus (where (Functor f) (Monoid b)) (Fn [(Fn [a] b) (f a)] b))
Sound good? This would be a pretty big breaking change so we'll have to try to be certain we can provide those generics for all existing interfaces.
@scolsen Looks great 👍
Hi! I usually hang out more in PureScript land than here, but Veit got my attention on this PR, so I'll leave a few throughts (of course feel free to ignore all of this if not relevant at all :blush:):
- AFAIK Carp doesn't have typeclasses, but already has interfaces - why not go to typeclasses instead of protocols? It doesn't feel that far from what you have here, and if adding them at a later time would be hard as they'd be a third way to do almost the same thing (polymorphism)
- something that I don't see mentioned here but that is the single most controversial point about typeclass implementations is the "orphan instance" issue: if you define a
class Fooin moduleA, and define aninstance Foo Barin moduleBandinstance Foo Barin moduleC, then buildingA+Bmight yield a different result from buildingA+C, and buildingA+B+Cis not possible at all. Clojure-styleprotocols do allow orphan instances, so you might want to do the same - however, allowing them doesn't play well with things like build determinism and a package ecosystem, and in fact some languages (such as PureScript) entirely disable them. I'd warmly recommend reading what I think is the best resource on typeclasses in language design out there to best inform the design of all of this. (and if you do, I'd be curious to hear what your thoughts on this are! It's a hard problem :slightly_smiling_face:)
@f-f thanks a lot for the feedback! I think this might be more of a naming issue than anything else – maybe we should stick with the "interface" name to avoid the confusion (or switch to "type class").
@f-f Thanks for the pointers and checking this out! There are definitely some bits to mull over. I'll let you know where we end up!
@f-f thanks a lot for the feedback! I think this might be more of a naming issue than anything else – maybe we should stick with the "interface" name to avoid the confusion (or switch to "type class").
That's not a bad idea. We could just keep interface and change the implementation to this more "typeclass-esque" style where the interface denotes a set of functions (rather than just one function).
I won't get a chance to take a stab at this until Aug 27th or so, so if anyone wants to try implementing it before then, please do!
Would this support composing abstractions in a way rust does:
trait A: B { // A supersets trait B
fn method(arg: C) -> void // Variable arg must implement trait C but can be absolutely whatever in terms of memory layout and other things
}
where trait is roughly = protocol This is a form of functional composition constraints that made rust a predictable language
Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?
Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?
Not in any significant way, we haven't touched this topic in a while.
I have experimented with making the type system more closely resemble GHC's approach to polymorphism, kinds, and type classes (at least as it stood when "Typing Haskell in Haskell" was written) . After that experiment I think I'd at least like to land somewhere in between the GHC approach and the approach we have in Carp currently. Not sure how others feel about it.
Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?
Not in any significant way, we haven't touched this topic in a while.
I have experimented with making the type system more closely resemble GHC's approach to polymorphism, kinds, and type classes (at least as it stood when "Typing Haskell in Haskell" was written) . After that experiment I think I'd at least like to land somewhere in between the GHC approach and the approach we have in Carp currently. Not sure how others feel about it.
Sounds fascinating— thanks.