ghc forkState# and joinState#

Implement forkState# and joinState#. Since these two primops require the linear arrow as part of their type, this commit also introduces improvements to the genprimopcode package that allow the linear arrow to be used in types in primops.txt.pp.

This has not been tested yet, but I'll playing around with these in linear-containers and confirming that they behave as expected.

Nov 05 '18 18:11 andrewthad

I'm rather in favour of this sort of things. But can we have a discussion about the intended semantics of such splits and joins? I'm not sure I've got a good view of what you have in mind.

Nov 05 '18 18:11 aspiwack

But can we have a discussion about the intended semantics of such splits and joins?

Yes. And I should include a description of the intended semantics somewhere in the GHC source tree as well. The forkState# and joinState# functions only start to become useful once the linearization of state tokens discussed in https://github.com/tweag/ghc/issues/179 has happened (well, that and having some mechanism for making the other primitive types be unrestricted, also discussed in the same issue). The need these is hinted at in section 2.2 of the Linear Haskell paper:

The other unsatisfactory thing about the monadic approach to array construction is that it is overly sequential. Suppose you had a pair of mutable arrays, with some updates to perform to each; these updates could be done in parallel, but the ST monad would serialise them.

The way I've implemented them in this PR, the user is afforded opportunities for concurrent execution (not parallel execution as the excerpt suggests). When a user forks, the expectation is that the two computations using the forked state tokens must be entirely independent of each other. The compiler may sequence them such that they are executed one after the it may be interleave them. I'm not sure the code generator will ever actually interleave them though, but the user should be prepared for this to happen. If there is a critical section the computations that the user really needs to ensure doesn't get interleaved, they should write it in a function with a NOINLINE pragma. This will guarantee that the critical section doesn't get other code mixed in with it.

Nov 05 '18 19:11 andrewthad

The build errors with:

libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.hs:1403:12: error:
    • Couldn't match type ‘'Omega’ with ‘'One’
        arising from an application
    • In an equation for ‘forkState#’:
          forkState# a1 = (GHC.Prim.forkState#) a1
     |
1403 | forkState# a1 = (GHC.Prim.forkState#) a1
     |            ^^

libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.hs:1406:12: error:
    • Couldn't match type ‘'Omega’ with ‘'One’
        arising from an application
    • In an equation for ‘joinState#’:
          joinState# a1 a2 = (GHC.Prim.joinState#) a1 a2
     |
1406 | joinState# a1 a2 = (GHC.Prim.joinState#) a1 a2
     |            ^^

libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.hs:1406:15: error:
    • Couldn't match type ‘'Omega’ with ‘'One’
        arising from an application
    • In an equation for ‘joinState#’:
          joinState# a1 a2 = (GHC.Prim.joinState#) a1 a2
     |
1406 | joinState# a1 a2 = (GHC.Prim.joinState#) a1 a2
     |               ^^

Here are the relevant lines from the generated PrimopWrappers.hs file:

{-# NOINLINE forkState# #-}
forkState# :: State# (RealWorld) ->. (# State# (RealWorld),State# (RealWorld) #)
forkState# a1 = (GHC.Prim.forkState#) a1
{-# NOINLINE joinState# #-}
joinState# :: State# (RealWorld) ->. State# (RealWorld) ->. State# (RealWorld)
joinState# a1 a2 = (GHC.Prim.joinState#) a1 a2

This code is correct, but somehow the actual primops in GHC.Prim end up with non-linear types.

Nov 05 '18 19:11 andrewthad

I've figured out why this is happening. Working on a fix.

Nov 05 '18 20:11 andrewthad

This now builds, and I am able to use these primps in a library. I still need to add some tests to GHC.

Nov 06 '18 11:11 andrewthad

concurrent execution (not parallel execution as the excerpt suggests)

I'm not sure what you mean by concurrent and parallel in this sentence. These words seem to take a number of distinct meanings depending on who is using them. So better to always define them precisely before using them. (in particular in your documentation)

When a user forks, the expectation is that the two computations using the forked state tokens must be entirely independent of each other.

Independent in that they never mention the same resource? (like in fork-join parallelism?). If so, it seems to mean that forkState# is an unsafe primitive? What safe abstractions can we build with it?

Do you have an intuitive semantics explaining what we are doing to the RealWorld when we forkState#?

What about joinState#? What is the semantic of it? And when is it supposed to be used?

Nov 07 '18 08:11 aspiwack

My definitions for concurrency and parallelism:

Parallelism: physically run two computations at the same time using multi-core infrastructure of a CPU
Concurrency: running two things simultaneously in a way does doesn't necessarily involve a multi-core CPU. The computations could be scheduled to run one after the other or they could be interleaved by context-switching or some other mechanism.

I tried to describe the semantics in the last section of https://github.com/tweag/ghc/pull/190#issuecomment-435997046, but I'll give something more intuitive here. In the descriptions below, I refer to a state token as well as the computations that depend on them, since forcing a state token causes an impure computation to take place.

forkState#: Whenever either of the two resulting state tokens are forced, the argument state token is forced. This only happens once. Put differently, the two computations built on top of the resulting state tokens have a dependency on the computation built beneath the argument state token.
joinState#: Whenever the resulting state token is forced, both of the argument state tokens are forced. Put differently, the computation built on top of the result state token has a dependency on both of the computations built beneath the argument state tokens. Interestingly, realWorld# is a left and right identity for joinState#.

A different way to describe this would be to say that state tokens currently model computations linearly (or at least they are supposed to if you use them correctly). That is, we always go to exactly one next place. With forkState#, we can now branch. Branching represents concurrent execution. We must have a way to collapse branches, and that's what joinState# provides.

Let me if this is a sufficient explanation of the semantics or if it's still unclear. The part of your question I left unanswered is

What safe abstractions can we build with it?

I'll write up an answer to that in a separate comment.

Nov 07 '18 13:11 andrewthad

Here's the motivating example. The simplest way to use linear types to capture provide a safe interface to an unmanaged heap does not need fork/join primitives. It looks like this:

data Reference a where Reference :: Addr# -> Reference a
data Token where Token :: State# RealWorld ->. Token
class Object a where
    size :: Proxy a -> Int
    poke :: Addr -> a ->. Token ->. Token
    peek :: Addr -> Token ->. (Token, a)
instance Object (Reference a)

allocate :: Object a => a ->. Token ->. (Token,Reference a)
deallocate :: Object a => Reference a ->. Token ->. (Token, a)
modify :: Object a => Reference a ->. (a ->. Token ->. (Token,a,Unrestricted b)) ->. Token ->. (Token, Reference a, Unrestricted b)
modify_ :: Object a => Reference a ->. (a ->. Token ->. (Token,a)) ->. Token ->. (Token, Reference a)
run :: (Token ->. (Token, Unrestricted a)) -> Unrestricted a

The data constructors of Reference and Token are not exported. Object is kind of like Storable. Importantly, the API (and anything else I discuss in this comment) can only express heaps that in which every object is pointed to by exactly one reference. No cycles are allowed, so we cannot do things like doubly-linked lists, and there is no sharing (since that requires multiple pointers to the same object) Despite these shortcomings, there is a lot that this API can express: most maps, singly-linked lists, queues, etc.

Threading the state token through everything makes the API cumbersome. What makes this even worse is that anything built on top of it inherits this same tedium:

-- singly-linked lists
data List :: Type -> Type
cons :: Object a => a ->. List a ->. Token ->. (Token, List a)
map :: Object a => (a ->. Token ->. (Token, a)) ->. List a ->. (Token, List a)

-- maps, just blindly accept the prim constraint for now
data Map :: Type -> Type -> Type
adjust :: (Prim k, Object v) => (v ->. Token ->. (Token, v)) -> k -> Map k v -> (Token, Map k v)

These are not good APIs for end users. The user shouldn't have to explicitly thread the token around like that. The solution I like is to pair the tokens with the data structures. Our foundation becomes:

data Reference a where Reference :: Addr# -> State# RealWorld ->. Reference a
data Token where Token :: State# RealWorld ->. Token
class Object a where
    size :: Proxy a -> Int
    poke :: Addr -> a ->. Token ->. Token
    peek :: Addr -> Token ->. (Token, a)
instance Object (Reference a)
instance Semigroup Token
instance Dupable Token

allocate :: Object a => a ->. Token ->. Reference a
deallocate :: Object a => Reference a ->. (Token, a)
modify :: Object a => Reference a ->. (a ->. (a,Unrestricted b)) ->. (Reference a, Unrestricted b)
modify_ :: Object a => Reference a ->. (a ->. a) ->. (Reference a)
graft :: Token ->. Reference a ->. Reference a
run :: (Token ->. (Token, Unrestricted a)) -> Unrestricted a

Since references references hold state tokens, this having multiple references in scope implies that at some point in the past, a token fork happened (with the Dupable instance). With this API, we can rebuild the list and map APIs from earlier but without any mention of state tokens (since those types can hold state tokens internally). Furthermore, it's safe. Consider this example (which for clarity assumes that linear let bindings work):

instance Object Int

example :: Unrestricted Int
example = run $ \t ->
  let (t1,(t2,t3)) = fmap dup (dup t) 
       x = allocate t1 5 :: Reference Int
       y = allocate t2 7 :: Reference Int
       z = allocate t3 x :: Reference (Reference Int)
       w = modify_ z (\r -> uncurry (\t4 a -> modify_ (graft t4 x) (+a)) (deallocate r))
       (t5,v) = deallocate w
       (t6,k) = deallocate v
    in (t5 <> t6, Unrestricted k)

Notice that the allocation of x and y do not happen in any determined order. We then store x inside of z. The internal implementation of poke (which is used to define allocate) for Reference uses joinState#, so the state token inside of x is joined with t3, giving us a guarantee that both of these happen before we x gets poked into z.

This is getting less clear than I hoped it would. Basically, under this API, whenever there is an operation that deals with multiple references (storing one reference in another, etc.), we are internally using joinState# to ensure that all effects on both of the references have been carried out.

Nov 07 '18 14:11 andrewthad

I'm not sure about the "forcing a State#" business, isn't State# unlifted?

I've got a few comments about your API.

It seems to me that you never actually use the fact that these are State# tokens. You could do the same by replacing them with Unit#, couldn't you?

Also, I don't think that you can do joinState# in allocate t3 x, as allocate is parametric, it will not know when it is passed something that carries aToken. The only place where you are using joinState#, I think, is in the Semigroup instance for Token (which is missing from your API, but I infer from your example you intended).

I'm wondering if your use of token-passing is here only to avoid the continuation style of allocation? If so, I must say that there is a bit of a tension which I had very little time to think about yet and less to document. But truth is, things which are unique in a certain sense (like our references here) seem to have an infinite number of possible introduction/allocation primitive: one in CPS, one in IO, one from another element of their types, one from any other thing which are supposed to be unique. My thoughts on this, which probably will echo yours, is that we need a synchronisation mechanism to turn all of these into one rule. My idea was to create a type Source (because I've been calling it a source of linearity), which would be Dupable (but not Movable!) and which I could create in two ways: from a State# RealWorld token, and with a continuation. All unique things would then be created canonically with a create :: Source ->. X, and all of them could give out a bit of Source like so: source :: X ->. (Source, X). I've been told something of a similar flavour might have been deployed in Alms, but I haven't looked it up yet.

As a comparison, I have a similar API for (pure in that case) malloc'd data, in linear-base. It uses a comonoid rather than a join, and deallocation doesn't return a token. See here. As a relevant note: (e ->.) is an applicative (also a monad, but it is less relevant) if and only if e is a comonoid (I don't exploit this fact yet, but it will simplify some things).

My instinct tells me that the comonoid style is better because it can be entirely handled with applicative primitives, hence expose more opportunity to reorder things to the compiler. On the other hand, you probably have a reason to prefer the more monadic style of state-passing. Which I would like to understand.

Nov 08 '18 08:11 aspiwack

I'm not sure about the "forcing a State#" business, isn't State# unlifted?

You're right. I worded that incorrectly. Here is a revision of those descriptions:

The forkState# primitive means that the two computations built on top of the resulting state tokens have a dependency on the computation built beneath the argument state token. The joinState# primitive means that the computation built on top of the result state token has a dependency on both of the computations built beneath the argument state tokens. Let us say that we have three effectful computations: f,g,h :: State# s ->. State# s. When composed as j s1 = let (# s2, s3 #) = forkState# (f s1) in joinState# (# g s2, h s3 #) (similarly, j :: State# s ->. State# s), the guarantee is that the effectful computation corresponding to f executes before the ones corresponding to g and h. Those other two computations execute in any order (including interleaved), but they will both be finished before whatever effectual computation that consumes the state token that j returns.

It seems to me that you never actually use the fact that these are State# tokens... The only place where you are using joinState#, I think, is in the Semigroup instance for Token.

I omitted some details for brevity in my previous post, but these do actually get used. For example, consider the implementation of poke for Reference (this is taken from linear-containers, so the types are more complicated):

import qualified Data.Primitive as PM

class Object (f :: Mode -> Type) where
  poke :: Addr -> f m ->. Token ->. Token
  ...

instance Object (Reference f) where
  poke = unsafeCoerce referencePoke

referencePoke :: Addr -> Reference f m -> Token -> Token
referencePoke (Addr addr) (Reference a (Token s0)) (Token s1) =
  case PM.writeOffAddr# addr 0# a (E.joinState# s0 s1) of
    s2 -> Token s2

The unsafe linearity coercion is only needed because we don't yet have move primops. We need joinState# here because the reference carries a token with it, but in general, we don't know that our objects carry tokens, so poke must take a token argument to allow data types that do not carry references or tokens.

I defined allocate :: Object a => a ->. Token ->. (Token,Reference a) to give the user a token as well as a reference. As you point out, this is unnecessary. It could instead just return a Reference, making its signature allocate :: Object a => a ->. Token ->. Reference a. The two are equivalent, although I think I prefer now prefer omitting the return token as you do. I had not considered the applicative instance for ->., which is really cool, and I do think that will result in a more pleasant experience for end users. In linear-containers, I have a class which kind of does what you describe as "giving out a bit of Source":

class Object f => Referential (f :: Mode -> Type) where
  -- | Combine the token with a reference inside of the object.
  inhume :: Token ->. f m ->. f m
  -- | Duplicate a token that exists in the object.                                                   
  exhume :: f m ->. (Token, f m)

And Reference has an instance of this typeclass. (Note that implementations of inhume always make use of joinState# via the Semigroup instance for Token). With these two typeclass methods, it becomes clear that the two definitions of allocate are equivalent.

My idea was to create a type Source (because I've been calling it a source of linearity), which would be Dupable (but not Movable!) and which I could create in two ways: from a State# RealWorld token, and with a continuation

I don't understand why there would need to be additional primops for creating a source. Phrased differently, I don't understand what kind of API State# (augmented with forkState# and joinState#) cannot handle. To me, it seems that you can write any introduction functions for many different APIs by just building on top of State# (this should be the case for both CPSed and non-CPSed APIs).

As a comparison, I have a similar API for (pure in that case) malloc'd data, in linear-base. It uses a comonoid rather than a join, and deallocation doesn't return a token.

This is pretty cool. Here is my analysis of the differences between our libraries (along with their significance in terms of the API).

linear-base has machinery for reusing Storable instances. linear-containers introduces it's own Object typeclass. Minor difference.
linear-base doesn't leak memory in the presence of exceptions. linear-containers does. This is an important feature, but as far as the API is concerned, it's not a huge difference. You do have to thread the Pool through every computation, which feels similar to the way I have to explicitly thread Tokens through certain computations in linear-containers.
linear-base allocates and deallocates using unsafeDupablePerformIO (which is just runRW# followed by throwing away the resulting state token). This means that it the actual underlying effects (malloc and free) depend on certain haskell values being evaluated to WHNF. linear-containers uses state tokens. This makes the APIs different in some areas. For example, linear-base can deallocate and only return the pointer's payload. linear-containers must also return a token (unless it starts requiring all data to carrying tokens, which is certainly a possibility). This is also why linear-containers needs the primops in this PR and linear-base does not. Using unsafeDupablePerformIO and with standard GHC thunk evaluation gives a similar semantics of have multiple computations that can happen in any order relative to one another.

My instinct tells me that the comonoid style is better because it can be entirely handled with applicative primitives, hence expose more opportunity to reorder things to the compiler. On the other hand, you probably have a reason to prefer the more monadic style of state-passing.

Nah. As I mentioned when talking about allocate not needing to return a state token, I don't actually like having that in there. To me, the tokens are just a necessary evil for certain functions (like deallocate), but I am contemplating purging them from most of the user-facing part of the API. The most significant difference between our APIs is that I'm scared to death of using unsafeDupablePerformIO for allocating and deallocating. It seems like you're making it work in linear-base, but I'm always fearful that the optimizer is just one step away from a case-of-known-data-constructor optimization that stops the effects from being executed. So I've instead gone the route of preserving tokens everywhere.

Nov 09 '18 21:11 andrewthad

ghc ghc copied to clipboard

forkState# and joinState#

ghc
ghc copied to clipboard