gluon Allow type signatures to be placed above declarations, ala-Haskell

As discussed on /r/rust, I've always found that typed declarations in MLs get long fast. Leads to people writing code that relies too much on global type inference, and is therefore is hard to figure out later. :(

let my_long_function : LongType -> LongerType -> LongestType -> Return
    my_long_function x y z = ...

Jul 17 '16 08:07 brendanzab

Agreed on needing a better way of writing type signatures. From reading the OCaml source code the lack of types can make it rather hard to follow at times. That said its difficult to judge how much of that is due to where type declarations are placed as OCaml also have interface files where types usually reside.

I am mostly on board with Haskell style declarations but I want to take some time to consider if there are any options which are also able to tackle the duplication of the name and the extra complexity in parsing (though I consider these problems to be minor in the grand scheme of things).

Jul 17 '16 15:07 Marwes

(Disclosure: I don't know much about how Haskell does typing) Are there any benefits to Haskell type signatures/ML type declarations as opposed to type declarations inside the functions themselves in the way that C does it?

int foo () {
    return bar + baz
}

declaring types as you go as with C would avoid name duplication during parsing, and is immediately a lot simpler.

Jul 17 '16 17:07 theblacksquid

@theblacksquid For C declarations specifically they have the problem that they are notoriously difficult to parse but I think your question is more about 'inline' type declarations.

First of all inline style type declarations have the same problem as ML declarations in that they get long very fast. This is arguably less of a problem in C or Rust which have parentheses to enclose the arguments and commas to separate which makes it relatively clean to separate each argument onto a separate line but for ML style languages I don't find that solution as clean.

// Not sure about the placement of `Return`
let my_long_function (x : LongType) (y : LongerType) (z : LongestType) : Return = ...

let my_long_function (x : LongType)
                     (y : LongerType)
                     (z : LongestType) : Return = ...

Parentheses are also required around each argument as there is no comma delimiters which adds noise.

On the other hand Haskell declarations solve both these problems, lines don't get long as quickly as arguments and types are on separate lines.

Hope that answers your question!

Jul 17 '16 17:07 Marwes

Hi, I'm not sure what the status of this is, but here are my two cents having written lots of OCaml in the last few years:

Personally, I like the (var : type) pattern because it lets me look at one place for both the argument name and the type. If the signature were separate, I'd have to mentally match up the argument with its type in another line. With type inference, I find it's common to just elide the type, as long as you choose a sufficiently meaningful argument name (which admittedly, a lot of OCaml code is not good at).
I've noticed that a long type is something of a code smell. It's usually a sign that you're expressing some important concept (say List (Float * Float) when you're talking about a 2D curve). In those cases, you really should and a type alias (type curve = List (Float * Float) or some such).
All that being said, I do think it would be nice to have a way to write down signatures on their own. Maybe something like sig my_long_function : LongType -> LongerType -> LongestType -> Return or even sig my_long_function -> (x : LongType) -> (y : LongerType) -> (z : LongestType) -> Return. Then if one day we decide to support separate interface files, it's just a matter of copying/generating these sig lines out. I'm not sure how this would work with the type checking machinery though.

Hope that helps.

Feb 01 '18 19:02 basus

For note, here are multiple ways to 'type' ocaml code:

Inferred, no types stated but strongly statically type checked

(* These two forms are identical *)
let some_function = fn a b c -> a + b + c
(* This is just a helper for the above to shorten code *)
let some_function a b c = a + b + c

Inline arguments, this has the advantage of being able to only type what you need to and let the rest be inferred.

let some_function = fn (a : int) (b:int) (c: int) -> a + b + c
let some_function (a : int) (b:int) (c: int) = a + b + c
(* They can be typed in non-args as well, the whole set of a single binding name gets typed to it regardless *)
let some_function = fn a b c -> (a : int) + (b : int) + (c : int)
let some_function a b c = (a : int) + (b : int) + (c : int)
(* And of course you can leave out any individual one you want as well *)

Fully typing the binding itself:

let some_function : int -> int -> int -> int =
  fn x y z -> x + y + z
(* And of course you can mix in the argument bindings as well *)

Or split them out:

(* In the interface file *)
val some_function : int -> int -> int -> int

(* In the source file *)
let some_function = fn a b c -> a + b + c
let some_function a b c = a + b + c

And in all cases every OCaml IDE I've seen from emacs to atom to vscode and more all show the types of things when selected, show types when typing, etc... etc.... etc......

I significantly prefer the OCaml typing style as you can have a short form, long form, or split form (or even no type forms).

Also if types are long enough to be unreadable you generally break it up into more lines, or name the types and use the names, like:

type t = int -> int -> int -> int
let some_function : t = fn a b c -> a + b + c

(* Or even piecemeal *)
type t = int -> int
let some_function : int -> int -> t = fn a b c -> a + b + c

Feb 01 '18 19:02 OvermindDL1

And in all cases every OCaml IDE I've seen from emacs to atom to vscode and more all show the types of things when selected, show types when typing, etc... etc.... etc......

This makes code review easier because you have to load the code up in your editor to see the types. It also limits a 'type-driven development' style. If you want to go implementation-first, I would kind of prefer using holes:

add1 : ?add1
add1 x y = 1 + y

Then expand the hole in your editor:

add1 : Int -> Int
add1 x y = 1 + y

Then its fixed, and will give you better type errors down the line as well.

Feb 02 '18 04:02 brendanzab

Yeah I don't really opt for leaving the types out unless it's 'obvious' otherwise. I tend to use the header style like:

let some_function : int -> int -> int -> int =
  fn x y z -> x + y + z

Or if the types are too long then like:

let some_function :
  int ->
  int ->
  int ->
  int = fn x y z ->
  x + y + z

Or just pull them out into their own named types (I'm a fan of lots and lots of named types).

But even then I tend to write the code first before the types, OCaml is fantastic about catching wrong code even with no type annotations, the compiler is always unambiguously right in OCaml (unlike GHC/Haskell). :-)

Feb 02 '18 15:02 OvermindDL1

@basus

Personally, I like the (var : type) pattern because it lets me look at one place for both the argument name and the type. If the signature were separate, I'd have to mentally match up the argument with its type in another line. With type inference, I find it's common to just elide the type, as long as you choose a sufficiently meaningful argument name (which admittedly, a lot of OCaml code is not good at).

Good point! I think it is really important that functions are rigorously annotated (at least top level bindings though what that means in gluon is a bit hard to define). It has already been mentioned in another comment but it really helps in code review, when searching through git blame etc (anywhere you don't have a language server to rely on). Related #436

I've noticed that a long type is something of a code smell. It's usually a sign that you're expressing some important concept (say List (Float * Float) when you're talking about a 2D curve). In those cases, you really should and a type alias (type curve = List (Float * Float) or some such).

Yup, I agree. But the problem here is specifically function signatures where you may have several types which individually are small but once paired together into a function type they become to large. Here it isn't really a good idea to make the combined function type into an alias since anyone looking at the function will actually be interested in what each of the arguments are.

All that being said, I do think it would be nice to have a way to write down signatures on their own. Maybe something like sig my_long_function : LongType -> LongerType -> LongestType -> Return or even sig my_long_function -> (x : LongType) -> (y : LongerType) -> (z : LongestType) -> Return. Then if one day we decide to support separate interface files, it's just a matter of copying/generating these sig lines out. I'm not sure how this would work with the type checking machinery though.

That might be nice actually, perhaps it could even tie into how mutually recursive functions are handled (#322) so that functions are only mutually recursive if they have been forward declared. Might be to verbose though it would make mutually recursive lazy values easier to understand.

sig f
sig g : forall a . a -> a

let f x = g x
let g y = f y

Feb 02 '18 19:02 Marwes

@OvermindDL1

Inline arguments, this has the advantage of being able to only type what you need to and let the rest be inferred.

Gluon has an alternative approach here actually which also works well with separate type annotations. By writing _ as the type we mark it as a "hole" which gets inferred. So let f == let f : _

But even then I tend to write the code first before the types, OCaml is fantastic about catching wrong code even with no type annotations, the compiler is always unambiguously right in OCaml (unlike GHC/Haskell). :-)

Does it handle the "spooky type errors at a distance" problems well? Often in heavily type inferred code I find that there is risk that the error happens in the wrong place (gluon has the beginnings of "bi-directional type inference" which should help with this somewhat)

Feb 02 '18 19:02 Marwes

Then if one day we decide to support separate interface files, it's just a matter of copying/generating these sig lines out. I'm not sure how this would work with the type checking machinery though.

I don't think separate interface files is a good idea, partly that is due to PTSD from editing C and C++ headers ad infinitum. But another part (and more important) is that I feel this should be handled by documentation tooling. That way you can keep documentation, types and implementation in one place and if you want a "signature" file that could be generated by extracting the type of the module and annotating with the documentation comments of the module.

/// id is a function
let id x = x 
{ id }

{
    /// id is a function
    id : forall a . a -> a
}

Feb 02 '18 20:02 Marwes

Does it handle the "spooky type errors at a distance" problems well? Often in heavily type inferred code I find that there is risk that the error happens in the wrong place (gluon has the beginnings of "bi-directional type inference" which should help with this somewhat)

OCaml uses HM type inference with few extensions (mostly modules and such), not the explosion of extensions that Haskell adds to it. What that means is that in any single context (a let) that type inference must Complete. Thus type errors are relegated only to where they actually occur. This means there is not such thing as type-classes, as those really require Higher-Kinded-Types, but it can emulate them via the Module system (the Witness pattern) in the same amount of code. OCaml's Modules are Higher Polymorphic Types. HPT's tend to be not just faster to compiler but SIGNIFICANTLY faster to compile. OCaml's optimizing compiler gets to within C speed yet can compile programs with hundreds of source files from a fresh non-incremental compile in just a couple seconds (and rebuilds of a single file changed average a few tens of milliseconds, OCaml even has a 'language server' of a form called 'merlin' long before the language server protocol was ever developed). OCaml's syntax is unambiguous, not whitespace sensitive (ew, I really hate whitespace sensitivity), and designed in such a way to be trivial to read and FAST to compile. It is a good model to follow (unlike Haskell).

and if you want a "signature" file that could be generated by extracting the type of the module and annotating with the documentation comments of the module.

Sounds like ocaml, you can have it generate the *.mli file from a *.ml file automatically. Generally only useful if you want to hide some top-level things from the public interface though. :-)

/me would love an OCaml'y scripting language

Feb 02 '18 21:02 OvermindDL1

(ew, I really hate whitespace sensitivity)

Sorry about that :). Gluon is whitespace sensitive since it makes it possible to omit the in tokens that separate the body of let and type bindings from their bindings. I think OCaml avoids this narrowly by having the concept of a top-level and sometimes (very rarely) needing ;; to be inserted, which is more or less equivalent to in.

let f x = x
in
f 123
// vs
let f x = x
f 123

Sounds like ocaml, you can have it generate the *.mli file from a *.ml file automatically. Generally only useful if you want to hide some top-level things from the public interface though. :-)

I figured that were possible. The problem I have encountered though is that the .mli files have been commited in the repo and the types (and sometimes) documentation has only been in those files forcing me to jump back and forth :/

Feb 02 '18 21:02 Marwes

Sorry about that :). Gluon is whitespace sensitive since it makes it possible to omit the in tokens that separate the body of let and type bindings from their bindings. I think OCaml avoids this narrowly by having the concept of a top-level and sometimes (very rarely) needing ;; to be inserted, which is more or less equivalent to in.

I know. ^.^;

And OCaml never needs ;;. ;; is exclusively needed only for the REPL to run 'everything entered up until this point Now'. In source code it is never ever needed. OCaml does have a different of top-level and non-top-level though. let's at top level are a complete expression (of the form let ... in ...), and it does not have in, but is also does not define a body (top-level let's are bodyless in general, though that is optional so you can chain things). Inner let's always have an in. It is entirely unambiguous because of syntax choices though.

Your example of:

let f x = x

f 123

In ocaml would generally more often be:

let f x = x

let 42 = f 42

A value at the top level is a 0-arg function (OCaml only has 0-arg and 1-arg functions, 0-arg functions are called 'values' and are run at load-time, not on each invocation), and you can pattern match on it, so if you call something at the top level it is generally matched on as well to ensure a successful return (or just name it like let _ = f 42 if you don't care about the return, but then it cannot be used either so it's just run once at startup, _ is the unbound identifier).

I figured that were possible. The problem I have encountered though is that the .mli files have been commited in the repo and the types (and sometimes) documentation has only been in those files forcing me to jump back and forth :/

You really needed to use merlin, OCaml is basically never worked on without merlin. ^.^;

Feb 02 '18 21:02 OvermindDL1

And OCaml never needs ;;. ;; is exclusively needed only for the REPL to run 'everything entered up until this point Now'. ...

Aha, that is how it works! A shame that won't really work in gluon though since it is incompatible with how just about everything is an expression.

You really needed to use merlin, OCaml is basically never worked on without merlin. ^.^;

I usually browsing code directly on Github and I run Windows which Ocaml does not play nice with :(.

Feb 02 '18 21:02 Marwes

I usually browsing code directly on Github and I run Windows which Ocaml does not play nice with :(.

It 'used' to not play nice with. Now there are cross compilers and opam even runs native on windows now and all. :-)

/me is a linux user, but has used ocaml on Windows 10 without issue

I think it runs in a cygwin or something, but it does work fine and makes fully standalone programs that run straight on windows with nothing else. :-)

Feb 02 '18 22:02 OvermindDL1

gluon gluon copied to clipboard

Allow type signatures to be placed above declarations, ala-Haskell

gluon
gluon copied to clipboard