zenscript icon indicating copy to clipboard operation
zenscript copied to clipboard

Language Discussion

Open keean opened this issue 8 years ago • 116 comments

The idea is to create a simple, compact, symmetrical, typed language for generic programming that compiles to JavaScript. The initial reference is "Elements of Programming" by Alexander Stepanov and Paul McJones.

The core concept is to have parametric types, type-classes and type-families, along with maybe higher-ranked-types, higher-kinded-types, to enable type-directed generic programming.

Inspiration is from C, C++ Concepts (type safe templates), Haskell, Rust, Eff, and others.

This is different from TypeScript, because it is based on type-classes, not classical inheritance, different from PureScript because it allows imperitivity and state.

This issue is for the initial design of the language, with other issues being forked as necessary for specific topics like syntax, semantics, type system etc.

keean avatar Sep 17 '16 12:09 keean

I'd like to chime in that Elements of Programming is one of my favorite tech books, it's got a very prominent space on my book shelf, next to Luca Cardelli's A Theory of Objects and Bird and Wadler's Introduction to Functional Programming (1st Edition). If a language could express the ideas in those three books combined, it might very well be my favorite language.

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read: http://sauerbraten.org/lee/ecoop.pdf

What will the solution for this problem look like in your language? (I think it's a great litmus test for languages) It shows why just type classes has some problems solved by delegation with differential inheritance (a more powerful superset of subclassing).

SimonMeskens avatar Sep 17 '16 12:09 SimonMeskens

@SimonMeskens I'm not familiar with "A Theory of Objects" but a while back I contributed to this paper https://arxiv.org/pdf/cs/0509027.pdf which shows how object types fit into a formal type system.

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

I will take a look at the examples in that paper, thanks.

keean avatar Sep 17 '16 15:09 keean

@SimonMeskens I have implemented an example from the Shark paper in Rust, as it is a similar language (imperative and has traits):

trait Animal {
    fn swim_away(&self) {
        println!("swim away");
    }
}

trait Encounter<A> : Animal where A : Animal {
    fn encounter(&mut self, other : &A);
}

struct Fish {}
struct Shark {
    healthy : bool
}

impl Shark {
    fn swallow<A>(&self, _ : &A) where A : Animal {
        println!("swallow");
    }
    fn fight<A>(&mut self, _ : &A) where A : Animal {
        self.healthy = false;
    }
}

impl Animal for Fish {}
impl Animal for Shark {}

impl Encounter<Fish> for Fish {
    fn encounter(&mut self, _ : &Fish) {}
}

impl Encounter<Shark> for Fish {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if other.healthy {
             self.swim_away()
        }
    }
}

impl Encounter<Shark> for Shark {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if self.healthy {
            self.fight(other);
        } else {
            self.swim_away();
        }
    }
}

impl Encounter<Fish> for Shark {
    fn encounter(&mut self, other : &Fish) where Self : Animal {
        if self.healthy {
            self.swallow(other)
        } else {
            self.swim_away();
        }
    }
}

I don't intend to use the same syntax as Rust, but it gives an idea of the structure in a language with type-classes. One key difference with JS as the target is that we have garbage collection, so we don't need to have any lifetimes, which simplifies things (although Rust correctly infers all the lifetimes in this example without any annotation), and also because of this we don't need affine types, and JS always passes by value (objects are values which are references to the object).

keean avatar Sep 17 '16 16:09 keean

First design questions are how much type inference should there be, and what are the consequences for syntax. Considering the simplest polymorphic function, in a Rust-like style:

fn id<A>(x : A) -> A { x }

Or Haskell style:

id :: A -> A
id x = x

The Haskell style has the advantage of working with type inference, and the Rust style has the advantage of automatically introducing scoped type variables.

Obviously there will need to be type signatures in type-class definitions, and records / module definitions.

keean avatar Sep 17 '16 22:09 keean

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best. If you want, I could give some samples of syntax that might be fun to play around with. I think I'm going to get out Elements of Programming and look up a few good examples and try to create a syntax that is made explicitly to express those problems.

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows, introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

SimonMeskens avatar Sep 17 '16 23:09 SimonMeskens

@keean the type signatures of all exported interfaces must be explicitly declared and not inferred, else modularity (separate compilation) is impossible which is critically needed if targeting JavaScript and its dynamic runtime module importing. This also makes the interaction of certain higher-order first-class type features decidable when moving away from the origin on the Lambda Cube, as well enables to implement typeclasses in more than one way (which Haskell can't do due to its global inference).

It also enables decidability with first-class anonymous (structural) unions (disjunctions), which is a mandatory feature of the language else I won't contribute. TypeScript, N4JS, and Ceylon (which emits JavaScript) have this feature. Scala 3 (Dotty) also has it (and there is a Scala.JS compiler for Scala 2). Ceylon also doesn't support typeclasses, but Scala 2 (and presumably Scala 3) does. Inferred structural unions are useful for many paradigms and is necessary to support the JavaScript || logical operator which doesn't always return a boolean.

Declaration of type signatures within functions (also methods) should be optional and inferred if not specified. Declaration of non-exported function type signatures could perhaps also be optional and inferred.

shelby3 avatar Sep 18 '16 01:09 shelby3

@keean to add to your opening comment of this issue thread, TypeScript is intentionally unsound typing and PureScript lacks first-class anonymous structural unions. N4JS appears to be sound typing, but also lacks typeclasses.

shelby3 avatar Sep 18 '16 02:09 shelby3

@SimonMeskens I am interested to see what the code looks like with delegates. I provided the sample in Rust because I can syntax check and run the example, there is always the problem of unchecked errors when posting hypothetical code samples.

keean avatar Sep 18 '16 06:09 keean

@shelby3 I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism. I think a reasonable assumption is that everything inferred is monomorphic. If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

keean avatar Sep 18 '16 06:09 keean

@shelby3 regarding first class unions, I think one of the key features is supporting the kind of programming you want to do. I do want to suggest an idea though, and see what the consequences are. The idea is to have no concrete types at all, so all types are expressed by type variables constrained by type classes. Consider the identity function using a cross between Haskell and TypeScript notation:

id(x : a) : a = x

'A' can be any type, so we could constrain it to be a type-class using a Prolog like notation:

id(x : a) : a :- Integer a = x

This is already the union of all types that implement Integer, but we can go further and have a type-level or operator:

id(x : a) : a :- Integer a | String a = x

This has a nice synergy with JavaScript and duck-typing as we are saying that we don't care what type is passed, as long as it provides the correct interface. We can type check this nominally in TraitScript, but it reflects the underlying JavaScript that allows any object to be passed as long as it provides the right methods and properties.

The question is, does that give all the functionality that you get with union-types?

keean avatar Sep 18 '16 06:09 keean

@shelby3 I find myself re-thinking in the morning and finding I don't agree with what I posted yesterday. An or operator for traits does not make any sense, as it is saying the object may have either method A or method B. What we want to know inside a function is if we can safely call method A for example.

I don't think there is any need for an 'or' operator, or first class unions outside of dynamic types (aka trait-objects). I think I need to revisit the original expression problem with type-classes.

Starting with a collection of trait-objects (existential types):

c : [a] :- Integer(a) = [new A(), new B()]

Requires all objects inserted into 'c' to implement the Integer trait. Now lets say we want to print this as well, we somehow need to inject the 'Show(a)' trait into 'c'. This is actually an 'and' operation, but it occurs in an 'open' way, somewhere else in the source code, something like:

show(c : Coll) {
    foreach(c, a => {
        print a;
    })
}

negate(c : Coll) {
    foreach(c, a => {
        a = -a
    })
}

This requires anything calling 'show' has to be able to prove all elements in collection 'c' provide show. This to me suggests that what is needed is some kind of constraint inference. We can easily infer the set of constraints on each element of 'c' as being (Show a, Negate a) or something like that. The question is whether 'c' is heterogeneous or homogeneous, there are two possible type signatures:

data Coll = Coll [a]
data Coll = forall a . (Show a, Negate a) => Coll [a]

In the former case 'show' and 'negate' would need to each have the constraints on '[a]' in their type signatures directly (although they can be inferred). One option would be to allow:

data Coll = forall a => Coll [a]

To automatically accumulate the inferred type classes. But what to do on module boundaries?

keean avatar Sep 18 '16 07:09 keean

@keean I explained to N4JS's @yoshiba why Joose and Object Algebras do not solve Wadler's Expression Problem and are not as orthogonally extensible as typeclasses.

shelby3 avatar Sep 18 '16 12:09 shelby3

@keean wrote:

@SimonMeskens wrote:

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read: http://sauerbraten.org/lee/ecoop.pdf

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

Also remember we discussed in May, that typeclasses (not virtualized typeclasses, i.e. Rust's trait objects) can be monomorphic at the function application use-site because the data type is not lost (as in subclassing where it is assigned to a supertype), thus avoiding the virtual method jump table lookup which otherwise stalls the CPU pipeline reducing performance by as much as to 5 - 6 times slower.

shelby3 avatar Sep 18 '16 15:09 shelby3

@shelby3 I have created a second issue here https://github.com/keean/traitscript/issues/2 to discuss union types.

I am fully aware of monomorphisation, which is when the type is statically determined. There is also runtime polymorphism (which it what haskell uses existential types for, and Rust uses Trait objects. I prefer the existential approach for the type system, as it is actually sound as a logic).

keean avatar Sep 18 '16 16:09 keean

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

keean avatar Sep 18 '16 16:09 keean

@keean wrote:

JS always passes by value (objects are values which are references to the object).

ASM.js passes by value unboxed. For functions operating only on numbers, afaics we could support this seamlessly. Emulating other types (e.g. strings) in ArrayBuffers would require more complexity to compile and interopt, and might require more C-like syntax. We could keep in mind these sort of optimizations and whether we want and could reasonably provide this without forcing the programmer to resort to a FFI.

No need to discuss this ASM.js tangent right now.

shelby3 avatar Sep 18 '16 18:09 shelby3

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

SimonMeskens avatar Sep 18 '16 18:09 SimonMeskens

@SimonMeskens wrote:

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows,

The huge salient difference is that multi-methods is dynamically dispatching to the implementation of the subclasses of Animal at run-time, thus stalling the CPU pipeline and adversely impacting performance.

Wheres, the Rust type-class example shows that type-classes can be resolved monomorphically at compile-time, so there is no jump table lookup.

Edit: another huge difference is that type-classes don't bind the multiple interfaces in the class inheritance and instead keep interfaces and classes orthogonal.

introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

@keean wrote:

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

Changing the interface based on the state of data type, conflates the instance of the data type with the interface. This makes orthogonal extension with for example collections of instances inefficient and boilerplate prone.

I don't think this is a paradigm that should be used.

P.S. if we do our job exceedingly well (as in demonstrate that type-classes and sound typing kick ass on subclassing), there is a remote possibility we could be designing the future replacement for EMCAScript. So performance should be high on our list of priorities, because performance is very difficult to achieve in a dynamically typed language and virtual inheritance of subclassing messes up performance. Google is looking for inspiration for their SoundScript concept of a soundly typed JS-like language in the browser. The guy leading that effort (Andreas Rossberg) is familiar with Scala, ML, etc.. Andreas commented that N4JS is "very interesting".

But I don't know if fast and incremental compilation will be achieved in the language we are contemplating. We can think about this in our design decisions if we want to.

shelby3 avatar Sep 18 '16 18:09 shelby3

SoundScript is described here: http://www.2ality.com/2015/02/soundscript.html which sounds like JavaScript plus enforced classical objects and inheritance, and simple function typing. There's nothing about generics or code generation.

keean avatar Sep 18 '16 18:09 keean

SoundScript's goal is to actually be somewhat compatible with TypeScript, I assume Google intends to make it so we don't have to compile TypeScript for Chrome anymore.

SimonMeskens avatar Sep 18 '16 18:09 SimonMeskens

@keean wrote:

which sounds like JavaScript plus enforced classical objects and inheritance

Until they figure out that virtual inheritance stalls the CPU pipeline and type-classes don't. I think we have a very tiny (almost nil) chance to influence the future. Any way, we should design for performance assuming one day our language will have its own VM and not just compiled to JS, if that doesn't incur any huge trade-offs in our design constraints. Also we can think about optimizing some parts with ASM.js. I am not sure how this thought process will play out. I am trying to keep my mind open and see where the design constraints lead us. I agree the priority is on having a simple, clean language without proliferation of duplicated paradigms.

Edit: battery life (i.e. performance) on mobile is important. I could see our language potentially becoming very popular for mobile apps.

There's nothing about generics or code generation.

In the link I provided, they mentioned TypeScript as a possible starting point, but aiming to make it sound. I think they will (or maybe already have) discovered that retaining interoperability with TypeScript code while also making TypeScript sound is probably not practical. Of course I could be wrong about this.

shelby3 avatar Sep 18 '16 18:09 shelby3

From now on, in all discussion where I mean to use type-classes, I will use the word trait to indicate the interface of the type-class, since that is the name Rust chose. We might choose a different keyword or not, yet until then I will use the term trait. And not the same thing as trait objects.

@keean wrote:

I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism.

So I assume by "first-class-polymorphism", you mean we can declare a (function, method, or class) type parameter constrain to a trait (or union of traits) bound, and then I can call functions and methods without knowing the actual type of the data which implements the trait(s)?

I presume by "first-class-polymorphism", you don't mean higher-ranked types.

I think a reasonable assumption is that everything inferred is monomorphic.

Unpacking this, I presume you mean that inference will never subsume to a subclassing subtype (with virtual inheritance) nor to a virtualized trait object (which also incurs dynamic dispatch and stalls the CPU pipeline)? And this is why we need anonymous structural union types, e.g.:

if (x) true
else 3

The inferred type is Boolean | Number.

I am presuming we will choose an "everything is an expression" grammar. Note that the following expression (not the return type of the containing function) has a type of undefined (or the bottom type):

if (true) return 1

If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

Agreed. Thus we need the suffixed : type and not the C/Java prefixed type declaration.

shelby3 avatar Sep 18 '16 19:09 shelby3

@SimonMeskens wrote:

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best.

I think I'd prefer to stay as close to JavaScript as possible in expression syntax while everything becomes an expression (optionally a statement as well). The point is to produce a language that is very easy for everyone to learn.

However, I hate noisy code (unless necessary to provide reasonable cognitive load), so I am open to adopting some function level grammar ideas from Haskell, such as:

  • mandatory block indenting instead of curly braces (Python is very popular)
  • function name overloading with default arguments (multiple versions of a function with same name)
  • everything is an expression (but some can also be used as statements with side-effects)
  • pattern matching with switch or match within function, but not on function arguments
  • guards within function, but not on function arguments
  • semicolons not allowed (don't enable to cram multiple statements on one LOC as it is not readable)

I prefer consistency of coding style because we are in the open source reality now, where code should be readable the same to everyone. So I would prefer to enforce a chosen style.

I realize we might not agree on all of these. Let's discuss. But syntax may not be the most important discussion to have now and first.

shelby3 avatar Sep 18 '16 20:09 shelby3

I agree with all of those list items :+1:

SimonMeskens avatar Sep 18 '16 20:09 SimonMeskens

@SimonMeskens cool we finally agree on something :heart_eyes:

shelby3 avatar Sep 18 '16 20:09 shelby3

Yeah, sorry if I acted like an ass on the TypeScript place, not sure why I get defensive there, I'm a pretty nice guy usually. I also don't have personal beef with you, life is far too short for that :smile:

SimonMeskens avatar Sep 18 '16 20:09 SimonMeskens

I agree with most of that, with some restrictions.

  • function name overloading, I would want all overloads to have the same type, so the overloads are on data constructors.
  • as an imperative language with mutation, i think statements make more sense, but in general everything that makes sense should be an expression too.
  • function overloading on constructors implies pattern matching in function arguments, so I think these two are mutually exclusive. You cannot have both. I would rather have pattern matching and overloading in arguments as it leads to less indentation.
  • I quite like function guards on arguments.

Couple of neat examples.

Pattern matching in arguments:

append x Nil = x
append x (Cons h t) = Cons h (append x t)

Guards on arguments:

find_zero Nil = false
find_zero (Cons h t) | h == 0 = true
find_zero (Cons h t) | h /= 0 = find_zero t

keean avatar Sep 18 '16 20:09 keean

@keean regarding patterns and guards on arguments, if we are targeting JavaScript, then I am thinking we should try to have a close correspondence between our source code and the emitted JavaScript. I think this is a reasonably high priority goal, given one is testing in the browser in the wild and doesn't have source maps.

I agree your examples do provide a slight reduction in verbosity, but...

Mucking with the way functions are declared will cause cognitive dissonance for programmers who only know C/C++/Java/JavaScript languages. The first time I saw pattern matching on function argument declarations, it looked like gobbledygook. You want to make this a popular language if possible? If I go adopting this language for a mobile apps platform ecosystem I have in mind, then I may receive complaints about the programming language being too weird/obtuse. Also the pattern matching syntax looks even noisier if we match the way functions are declared with the way they are applied with mandatory parenthesis:

append<A>(x: A, list: Nil): A = x
append<A>(x: A, List<A>(head, tail)): List<A> = new List(head, append(x, tail))

Alternative:

append<A>(x: A, List(head: A, tail)): List<A> = new List(head, append(x, tail))

Compare to:

append<A>(x: A, list: List<A>): List<A> = new List(list.head, append(x, list.tail))

Is it really worth complicating it for newbies and adding another paradigm?

Alternatively we could add where but it adds another keyword (and paradigm):

append<A>(x: A, list: List<A>): List<A> = new List(head, append(x, tail))
  where head = list.head, tail = list.tail

But that is unnecessary because if the function will be multiple LOC, then we can just reuse val head = list.head instead (val being same as JavaScript const):

append<A>(x: A, list: List<A>): List<A> =
  val head = list.head, tail = list.tail
  new List(head, append(x, tail))

I think I prefer let instead of val and var instead of let (but the emitted transposed JavaScript may be confusing). JavaScript really mucked this up badly because they needed to alter the semantics of mutable var and so needed a new keyword for mutables and they choose let to represent a variable but let should be immutable. And these keywords are referring to mutability of reassignment and not to whether what they reference is immutable.

Edit: also patterned guards are going to create name mangling in the emitted JS code if you are resolving these statically.

P.S. note I prefer to write that with an inferred return (result) type:

append<A>(x: A, list: List<A>) = new List(list.head, append(x, list.tail))

shelby3 avatar Sep 18 '16 21:09 shelby3

@keean wrote:

function name overloading, I would want all overloads to have the same type, so the overloads are on data constructors.

function overloading on constructors implies pattern matching in function arguments, so I think these two are mutually exclusive. You cannot have both. I would rather have pattern matching and overloading in arguments as it leads to less indentation.

Can you unpack that? I don't understand.

c : [a] :- Integer(a)

Gobbledygook to my eyes. What does that mean?

Looks like a pattern similar to c > [a] >= Integer(a).

Please no symbol soup :-. Is that a smiley face without the smile.

shelby3 avatar Sep 18 '16 21:09 shelby3

In the append case both arguments are lists.

It's not just the reduction in verbosity, but the lack of indenting, keeping everything lined up to the left makes for more readable code. I am not concerned about keeping close to JS, especially if native compilation or web-assembly are future targets. However I do think you have a point about people not used to the style.

If you don't have pattern match in arguments, then you can only have one function declaration for each function, no ad-hoc overloading, because you need to use a type-class to overload different argument types.

I think let for immutable assignment, and I wonder is mutable assignment is necessary, as objects are a reference you can mutate the object referenced, even if the reference itself is immutable. Perhaps we could start with only single immutable assignment and see how it goes?

keean avatar Sep 18 '16 21:09 keean