alpaca icon indicating copy to clipboard operation
alpaca copied to clipboard

FFI Bridge Proposal

Open j14159 opened this issue 8 years ago • 19 comments

Bridges are to MLFE as ports are to Elm, without the send/receive and subscription semantics.

This is motivated by questions from @imetallica, discussion and feedback from @omarkj, and naming concerns from @lpil.

Example:

bridge append_ints = :erlang :"++" [list int, list int] list_int

Given the above in a module, the compiler will synthesize the function append_ints, typed to take to integer lists and return one that is a combination of both:

{t_arrow, [{t_list, t_int}, {t_list, t_int}], {t_list, t_int}}

The typer will trust that the author has considered the types involved and will expose this function for type checking. The code generator will create this function in the output Core Erlang AST and programmatically create the necessary checks for the return value. If we follow what Elm has done, this will create some substantial overhead on any recursive type like lists, maps, and recursive ADTs as each element must be checked before returning the result to MLFE code. A more problematic example:

type maybe_io_device = Ok pid unit | Error atom

bridge open_file = :file :open [string, list atom] io_device

If you refer to the erldocs for file:open/2, you'll notice the types I've given above to the bridge are incomplete, for example I'm not accounting for the fd() type which in the given docs doesn't appear to devolve to a pid. A larger issue is that currently the compiler would render the maybe_io_device ADT as either {'Ok', Pid} or {'Error', ErrorAtom} in any pattern match checking the validity of the return. This is relatively trivial to change and may make sense for simpler handling of common Erlang patterns directly as ADTs with no intermediary translation layer at all.

More specifically, given the changes to how ADTs are rendered, the code above would be synthesized to the following in the code generator:

open_file(Filename, Modes) ->
    case file:open(Filename, Modes) of
        {ok, IO}=Ok when is_pid(IO) -> Ok;
        {error, Reason}=Err when is_atom(Reason) -> Err
    end.

This has rather large safety implications:

  1. do we let this explode on the Erlang side unchecked, somewhat similarly to Elm?
  2. do we generate code that is already wrapped in a try/catch to account for errors, utilizing some sort of built-in type like type try 'x = Success 'x | Error erlang_exception? I use this type in Scala but does this remove Erlang-ness from the language?
  3. should we have a default safe mode as in the previous point and a keyword to remove the try/catch, e.g. unsafe bridge open_file = ... that doesn't wrap the result?
  4. should bridges always be unsafe but any that occur without a surrounding try/catch raise a compiler warning?

I'm leaning towards point 3 at the moment but curious about other opinions and would like to know if I've missed anything (beyond the complexity checking recursive structures entails).

j14159 avatar Jul 03 '16 16:07 j14159

Exceptions aren't really idiomatic erlang, so I wouldn't consider that when deciding what to do. I think tagged return types are more idiomatic and lend themselves well to the Either type you describe above. As long as there is some monad or monad like system (Janestreet's monad stuff is a great example in the ocaml world), I think its ok.

Other approaches add complexity to understanding a system that I just don't think is needed.

ericbmerritt avatar Jul 05 '16 22:07 ericbmerritt

@ericbmerritt thanks for the feedback! So something like #2 with a try or either type and some basic BIFs for working with them cleanly perhaps. I'm not sure yet that the monadic operations (bind, etc) should be built into the language but mostly because I simply haven't thought through it properly yet. Any particular suggestions/opinions there?

j14159 avatar Jul 06 '16 14:07 j14159

Yes. Exactly,.

I don't actually think monadic binds should be built into the language. It think that is one of the very few mistakes that the haskell guys made. In ocaml, the monadic operators are just functions and that works super well. You do end up with a lot of

some starting expression
>>= fun x -> y
>>= fun z -> return (a, y)

I think that actually makes what is going on much more understandable and reduces the confusion quite a lot. It makes the monads fit nicely into the language, without them being a weird sub notation that hides whats going on underneath. (I say this while still loving haskell btw).

To summarize, I think having bind and map operators is and a reasonable infix syntax is enough. With that you can always add something like do notation after the fact. I think if you are adding edge cases to deal with this its going to bite you in the butt.

ericbmerritt avatar Jul 06 '16 15:07 ericbmerritt

Makes sense, thanks again! Infix syntax is something I need to think more about as well. I've been entirely avoiding the definition of functions with operator-like names as I've seen some...cryptic extremes...in Scala. That doesn't argue against the utility of something decent for infix, of course.

j14159 avatar Jul 06 '16 15:07 j14159

Scala is, unfortunately, a really poor example of good infix names. Haskell/ocaml are better sources with much better rules. I would take those as examples more then scala. Scala is a great bridge language. That is, it does a good job of dragging Java into a semi functional space, but its not a very good functional language in its own right.

ericbmerritt avatar Jul 06 '16 15:07 ericbmerritt

I came across this. Its interesting and might inform the monad discussion: https://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html

ericbmerritt avatar Jul 12 '16 14:07 ericbmerritt

Interesting read. My take so far is roughly along the lines of "nothing built-in for now and make sure ADTs stay simple" :) I think at a minimum it will likely be beneficial to compile ADT tags to lower-case (e.g. Ok -> ok) although this does have some potential issues with unicode perhaps.

I'm not sure how to tackle the infix stuff yet tbh. I struggle with allowing methods like >:> but maybe it's something I should just be hands-off about. I should really spend a bit of time looking at some other examples. Open to suggestions!

j14159 avatar Jul 13 '16 03:07 j14159

:+1: for a small, extensible core language with add-ons as libraries.

yurrriq avatar Jul 13 '16 05:07 yurrriq

case is problematic in a unicode world and I would strongly encourage you not to rewrite things if you can get away with it. I would just change the syntax. I guess this all boils down to do you want to do ML on top of erlang or do you want to do a strongly typed functional language on top of erlang. If its the former, then yea you may have to do some rewriting if its the latter then I wouldn't do rewriting and would just change the syntax to be more accommodating to beam environment.

ericbmerritt avatar Jul 13 '16 15:07 ericbmerritt

Aiming more for the latter to be honest. I think what exists is already fine, the rewriting speculation was mostly around whether or not to make {ok, SomeSuccess} | {error, SomeError} a bit more seamless for a brevity-focused FFI. Maybe another case where "hands-off for now" makes sense :)

j14159 avatar Jul 13 '16 17:07 j14159

Since Alpaca is in its design phase and we're almost in 2018, may I suggest to investigate Algebraic Effects (found in Koka, Eff, Multicore OCaml) as a contemporary alternative to monads?

Reading list:

ghost avatar Dec 04 '17 20:12 ghost

As a data point Purescript is in the process of switching from Eff to the simpler IO. I think the general opinion is that it added a lot of complexity that made the language harder to learn, without much obvious benefit. Eff is going to be available as a library for those that have a use case for more fine grained effects.

lpil avatar Dec 04 '17 20:12 lpil

Yeah I'd vote on Algebraic Effects as well, it is the generic version of everything from ports to IO to even concurrency (you can represent Beam Actor concurrency as effects even).

OvermindDL1 avatar Dec 04 '17 20:12 OvermindDL1

As a data point Purescript is in the process of switching from Eff to the simpler IO. I think the general opinion is that it added a lot of complexity that made the language harder to learn, without much obvious benefit. Eff is going to be available as a library for those that have a use case for more fine grained effects.

How the heck did they implement it, do you have a doc? Algebraic Effects in MC-OCaml is trivially easy to use with a trivial interface... o.O

OvermindDL1 avatar Dec 04 '17 20:12 OvermindDL1

Ah I see, purescript does not use pure Algebraic Effects and instead wraps them in monads, which makes them substantially more verbose... Ew... Definitely not go that way...

OvermindDL1 avatar Dec 04 '17 20:12 OvermindDL1

trivially easy

I would say this comes down to the user. Many people I have worked with struggle with simple IO.

I guess a good question is what is Alpaca optimising for? If we're aiming to be easy to learn and be productive with (like Elm) then we need to think carefully about complexity here.

lpil avatar Dec 04 '17 20:12 lpil

Well the examples, here is a simple state monad in Purescript via https://github.com/purescript/purescript-st/blob/master/src/Control/Monad/ST.purs and https://github.com/purescript/purescript-st/blob/master/src/Control/Monad/ST.js (because apparently purescript does not have a real effects system, it only has 'interfaces', what the heck...):

module Control.Monad.ST where

import Control.Monad.Eff (Eff, kind Effect, runPure)

foreign import data ST :: Type -> Effect

foreign import data STRef :: Type -> Type -> Type

foreign import newSTRef
  :: forall a h r
   . a
  -> Eff (st :: ST h | r) (STRef h a)

foreign import readSTRef
  :: forall a h r
   . STRef h a
  -> Eff (st :: ST h | r) a

foreign import modifySTRef
  :: forall a h r
   . STRef h a -> (a -> a)
  -> Eff (st :: ST h | r) a

foreign import writeSTRef
  :: forall a h r
   . STRef h a
  -> a
  -> Eff (st :: ST h | r) a

foreign import runST
  :: forall a r
   . (forall h. Eff (st :: ST h | r) a)
  -> Eff r a

pureST :: forall a. (forall h. Eff (st :: ST h) a) -> a
pureST st = runPure (runST st)

And the javascript part because purescript apparently did not have a real effects system:

"use strict";

exports.newSTRef = function (val) {
  return function () {
    return { value: val };
  };
};

exports.readSTRef = function (ref) {
  return function () {
    return ref.value;
  };
};

exports.modifySTRef = function (ref) {
  return function (f) {
    return function () {
      return ref.value = f(ref.value); // eslint-disable-line no-return-assign
    };
  };
};

exports.writeSTRef = function (ref) {
  return function (a) {
    return function () {
      return ref.value = a; // eslint-disable-line no-return-assign
    };
  };
};

exports.runST = function (f) {
  return f;
};

And an equivalent one (with the same horrible names, like yeesh...) in MC-OCaml, but actually a proper effects system:

(* This part is not part of the effect, just defining a common safely-typed universal type storage *)
module Univ = struct
  module type S = sig type t exception E of t end
  type 'a prop = (module S with type t = 'a)

  let create (type s) () =
    let module M = struct type t = s exception E of t end in
    (module M : S with type t = s)

  let inj (type s) p x =
    let module M = (val p : S with type t = s) in
    M.E x

  let proj (type s) p y =
    let module M = (val p : S with type t = s) in
    match y with M.E x -> Some x | _ -> None

  let embed () = let p = create () in inj p, proj p
end

(* Here is where the effects module definition starts, this is the part that is the same as purescript's 'interface' file *)

type 'a t = {inj : 'a -> Univ.t; prj : Univ.t -> 'a option}

effect STRef : 'a -> 'a t
let newSTRef v = perform (STRef v)

effect STRead : 'a t -> 'a
let readSTRef r = perform (STRead r)

effect STWrite : 'a t * 'a -> 'a
let writeSTRef r v = perform (STWrite (r, v))

effect STModify : 'a t * ('a -> 'a) -> 'a
let modifySTRef r f = (* This can be implemented as both a read and write, so let's do so *)
  writeSTRef r (f (readSTRef r))

(* And here's the OCaml part of what purescript has to do in javascript because of its limitations... *)
let runST =
  let comp =
    match f () with
    | v -> (fun s -> v)
    | effect (STRef v) k -> (fun s ->
        let (inj, prj) = Univ.embed () in
        let cont = continue k {inj;prj} in
        cont (inj v::s))
    | effect (STRead {inj; prj}) k -> (fun s ->
        match find prj s with
        | Some v -> continue k v s
        | None -> failwith "Ref.run: Impossible -> ref not found") (* Won't actually happen *)
    | effect (STWrite ({inj; prj}, v)) k -> (fun s ->
        continue k () (inj v::s))
  in comp []

OCaml's is all done 'inside' ocaml, it is all purely functional, no mutation actually happens, etc... etc... In addition, I find it very easy to read, but I'm also used to OCaml so I may be biased... ^.^;

With something like an IO module it is just represented as sending/receiving messages (Remember, the Erlang Actor concurrency style, messages and all, are all easily represented as an Algebraic Effect), you could do the same with a State module too, but that is pretty heavy-weight just to hold some 'state'. :-)

EDIT: But as you see OCaml does it easily, just a set of variant names that are the effects, and a wrapper function that just matches on the effects continues the k continuation. A full State effect that can store any type like one is more complex than most by far.

OvermindDL1 avatar Dec 04 '17 21:12 OvermindDL1

:+1: To algebraic effects (and state [machine]) model(s)!

  • http://docs.idris-lang.org/en/latest/effects/simpleeff.html
  • https://github.com/edwinb/States
  • https://edwinb.wordpress.com/2016/10/21/state-machines-all-the-way-down/

yurrriq avatar Dec 06 '17 01:12 yurrriq

edwinb/States

That's pretty nasty, definitely prefer doing it the OCaml way. ^.^;

OvermindDL1 avatar Dec 08 '17 15:12 OvermindDL1