reactive-banana icon indicating copy to clipboard operation
reactive-banana copied to clipboard

Change `Event` and `Behavior` to have a representational role

Open ocharles opened this issue 2 years ago • 11 comments

Fixes #219. The main problem is that in newtype Event a = E (Prim.Event a), we have

type Prim.Event a = Cached Moment (Pulse a)
data Cached m a = Cached (m a)

and that Cached m a definition forces a to be nominal.

This PR works around this nominal role by using a Yoneda-trick, which restores the role to be representational (at the cost of carrying a mapping function around).

ocharles avatar Sep 14 '21 20:09 ocharles

What if you unfold Cached in Event + make Cached a newtype? Then Event a is coercible to Cached Moment (Pulse a) and you should have a representable argument.

newtype Cached m a = Cached (m a)

newtype Event a = E_ (Moment (Pulse a))

Also: Semigroup, Monoid, Num can be derived for Behaviour via Ap Behaviour a and Future/Moment/MomentIO can newtype derive Functor, Applicative, Monad, MonadFix.

Screenshot from 2021-09-15 01-41-14

Icelandjack avatar Sep 15 '21 00:09 Icelandjack

@Icelandjack I need to confirm but I think we might need cached to be data for laziness stuff. But if we don't need that, then your idea is great and gives us one less pointer indirection, too!

ocharles avatar Sep 15 '21 07:09 ocharles

Last I looked - and my understanding could be way off here - the Cached type is actually not necessary, it just serves as sort of a reminder that some Cached m computation (namely the ones built fron cache, not dontCache) only run the m effects at most once. But this of course only requires MonadIO and MonadFix.

So if I had to speculate wildly, I'd guess Cached could be a newtype.

mitchellwrosen avatar Sep 15 '21 19:09 mitchellwrosen

So if I had to speculate wildly, I'd guess Cached could be a newtype.

Let's give it a try then! I'm very keen to chase one fewer pointer if possible. I'll look into this. What @Icelandjack proposes is much nicer than my Yoneda hacking, as cute as it is.

ocharles avatar Sep 15 '21 19:09 ocharles

Alas, Cache needs to use data because we want to memoize the value ("laziness stuff") which is used to implement observable sharing.

Observable allows us to use an Event value multiple times without duplicating an Event internally. For example, the code

let e1 = … some event …; e1 = e2 in unionWith const e1 e2

uses the same event internally for e1 and e2; this is not automatic. It would be semantically correct if we duplicate the underlying stream of events of e1 in e2, but less performant.

Observable sharing must use unsafePerformIO (or similar), which happens here:

https://github.com/HeinrichApfelmus/reactive-banana/blob/4dcf4b01cdb4cbdcd5347b099a17bda1d025d2e5/reactive-banana/src/Reactive/Banana/Prim/Cached.hs#L33-L36

The fact that Cache is data ensures that the unsafePerformIO is not called twice for the same value. Subtle, but very important.

The need for observable sharing likely also rules out other solutions, such as a Yoneda embedding like E a ~ (something a → r) → r. The issue is that function arrows do not play well with sharing (though it is not impossible to make it work).

HeinrichApfelmus avatar Sep 15 '21 22:09 HeinrichApfelmus

How important is this issue of type roles? If it is a minor improvement, I would recommend against complicating the representation of Event or likewise in any way.

If I understand correctly, the issue seems to be that Cached uses m a which is only nominal in a, because m could be a type family. However, we only use the special case m = Moment, which is a type synonym where the parameter a is representational. If we can override the type role, I recommend we do that, if not, then this example is probably something we should show to the GHC developers and leave it alone for now.

HeinrichApfelmus avatar Sep 15 '21 22:09 HeinrichApfelmus

How important is this issue of type roles? If it is a minor improvement, I would recommend against complicating the representation of Event or likewise in any way.

Not hugely, though I refer back to my original example. This is code I wrote and expected to work, and was a bit disappointed to find I was going to have to hand-write the Monoid and Semigroup instances.

If I understand correctly, the issue seems to be that Cached uses m a which is only nominal in a, because m could be a type family. However, we only use the special case m = Moment, which is a type synonym where the parameter a is representational. If we can override the type role, I recommend we do that

Yep, your understanding is correct. I'll see if I can make m representational. I wonder also if data CachedMoment a = CachedMoment (Moment a) is something that could be considered

ocharles avatar Sep 15 '21 22:09 ocharles

Thanks for the clarification, that is subtle indeed... so subtle that I'm not sure I understand it yet!

Simplifying the machinery a tiny bit, let's say I decorate an IO action with an unsafePerformIO preamble, just like cache:

{-# NOINLINE cache #-}
cache :: IO a -> IO a
cache action =
  unsafePerformIO do
    putStrLn "hello, world!"
    pure action

I should still expect this function to return a thunk that, when forced to WHNF, prints "hello, world" and returns the IO action passed in. So regardless of how many pointers to the returned action exist, the thunk will only be updated in place once.

Sorry for dragging this on, feel free to ignore this comment

mitchellwrosen avatar Sep 15 '21 22:09 mitchellwrosen

I should still expect this function to return a thunk that, when forced to WHNF, prints "hello, world" and returns the IO action passed in. So regardless of how many pointers to the returned action exist, the thunk will only be updated in place once.

If values of type IO a were thunks, this would be correct. But IO gets special treatment from the compiler. I just checked that it works as you expect it to work in the interpreter, but I don't think that it will (always) work in compiled code.

I remember recording a remark on this in the HeinrichApfelmus/optimize-monad-trans repository at one point:

Note that the same reasoning could be applied to the state token for the IO monad. Why doesn't GHC build closures for that? The answer is that the compiler treats the State# type constructor as a special case and doesn't refrain from recomputing IO actions. This is GHC's infamous state hack.

HeinrichApfelmus avatar Sep 16 '21 19:09 HeinrichApfelmus

In the documentation on role annotations, a data type that is identical to Cache m a is mentioned specifically. They call it Tricky a b. :joy: I don't suppose that there is a way to tell the compiler that m can never be a type synonym? There could be a StackOverflow question for that. The case where m is a monad parameter actually seems like a fairly common use case to me.

HeinrichApfelmus avatar Sep 16 '21 19:09 HeinrichApfelmus

Does the "state hack" thwart the sharing attempt in this case? I don't see how yet:

As I understand it, the "state hack" is to label all lambdas binding State# as one shot so that a let on the outside will be floated into the lambda, thus eta-expanding the state lambda beyond the let. This comes back to bite us when the we execute that IO action multiple times. A small example would be:

expensive :: Int -> Int
expensive n = n
{-# NOINLINE expensive #-}

stateHack :: Int -> State# RealWorld -> (# State# RealWorld, Int #)
stateHack n =
  let i = expensive n
   in (\s -> (# s, i #))

where stateHack produces the core:

stateHack
  = \ (n_avF :: Int) (s_avH :: State# RealWorld) ->
      (# s_avH, expensive n_avF #)

and we see that our state lambda was eta-expanded outside of our let.

Compare this with a non-state argument:

normal :: Int -> () -> (# (), Int #)
normal n =
  let i = expensive n
   in (\s -> (# s, i #))

and we see that expensive is shared across calls of the returned function, as expected

normal
  = \ (n_avI :: Int) ->
      let {
        i_sFV :: Int
        [LclId]
        i_sFV = expensive n_avI } in
      \ (s_avK :: ()) -> (# s_avK, i_sFV #)

We aren't discussing the usual "state hack" problem of a let expression returning an IO function. We are using unsafePerformIO which itself uses runRW#, and runRW# is not inlined until after all core-to-core passes.

So, let's instead consider a variant of expensive and stateHack where expensive is in IO and we run it with runRW#:

expensiveIO :: Int -> State# RealWorld -> (# State# RealWorld, Int #)
expensiveIO n s = (# s, n #)
{-# NOINLINE expensiveIO #-}

stateHackIO :: Int -> (State# RealWorld -> (# State# RealWorld, Int #))
stateHackIO n =
  -- newAction :: IO (IO Int)
  let newAction :: State# RealWorld -> (# State# RealWorld, State# RealWorld -> (# State# RealWorld, Int #) #)
      newAction st0 = case expensiveIO n st0 of
        (# st1, i #) -> (# st1, \s -> (# s, i #) #)
   in case runRW# newAction of
        (# _, a #) -> a

In this variant of stateHack the returned state lambda is inside a tuple, and expensiveIO n st0 cannot be inlined into the state lambda because the resulting state token, st1, is used in the unboxed tuple that contains the state lambda.

Peeking at the core reveals that we get the desired sharing:

Core
stateHackIO
  = \ (n_axw :: Int) ->
      case runRW#
             @ ('TupleRep '[ 'TupleRep '[], 'LiftedRep])
             @ (# State# RealWorld,
                  State# RealWorld -> (# State# RealWorld, Int #) #)
             (\ (st0_axy [OS=OneShot] :: State# RealWorld) ->
                case expensiveIO n_axw st0_axy of { (# ipv_syX, ipv1_syY #) -> -- execute `expensiveIO` and bind result to `ipv1_syY`
                (# ipv_syX,
                   \ (s_axB :: State# RealWorld) -> (# s_axB, ipv1_syY #) #) -- `ipv1_syY` is captured in the returned IO function
                })
      of
      { (# ipv_sz1, ipv1_sz2 #) ->
      ipv1_sz2
      }

Now consider a cache example using IORef

cache :: IO a -> IO a
cache action =
  unsafePerformIO do
    ref <- newIORef Nothing
    let newAction = do
          readIORef ref >>= \case
            Nothing -> do
              res <- action
              writeIORef ref (Just res)
              pure res
            Just res -> pure res
    pure newAction

We are returning an IO function, that is, a State# bound lambda, and it seems to me that the important question would be: "Does the returned State# lambda eta-expand past the creation of the IORef?" I don't see how it would, and the core I compiled below shows that it did not. I think this is because of the reasons I outlined in the simpler example above, namely, that the returned IO function is inside an unboxed tuple that provides no eta-expansion opportunity, and runRW# is not inlined until CorePrep.

Core
cache
  = \ (@ a_axD) (action_awc :: IO a_axD) ->
      case runRW#
             @ ('TupleRep '[ 'TupleRep '[], 'LiftedRep])
             @ (# State# RealWorld, IO a_axD #)
             (\ (s_aE8 [OS=OneShot] :: State# RealWorld) ->
                case noDuplicate# @ RealWorld s_aE8 of s'_aE9 { __DEFAULT ->
                case newMutVar#
                       @ (Maybe a_axD) @ RealWorld (GHC.Maybe.Nothing @ a_axD) s'_aE9
                of
                { (# ipv_aET, ipv1_aEU #) ->  -- Our IORef is created here
                (# ipv_aET,
                   (\ (s1_aEK :: State# RealWorld) -> -- And here is our State# lambda capturing the IORef
                      case readMutVar# @ RealWorld @ (Maybe a_axD) ipv1_aEU s1_aEK of
                      { (# ipv2_XF8, ipv3_XFa #) ->
                      case ipv3_XFa of {
                        Nothing ->
                          case (action_awc
                                `cast` (GHC.Types.N:IO[0] <a_axD>_R
                                        :: IO a_axD
                                           ~R# (State# RealWorld -> (# State# RealWorld, a_axD #))))
                                 ipv2_XF8
                          of
                          { (# ipv4_XFi, ipv5_XFk #) ->
                          case writeMutVar#
                                 @ RealWorld
                                 @ (Maybe a_axD)
                                 ipv1_aEU
                                 (GHC.Maybe.Just @ a_axD ipv5_XFk)
                                 ipv4_XFi
                          of s2#_aGj
                          { __DEFAULT ->
                          (# s2#_aGj, ipv5_XFk #)
                          }
                          };
                        Just res_axr -> (# ipv2_XF8, res_axr #)
                      }
                      })
                   `cast` (Sym (GHC.Types.N:IO[0] <a_axD>_R)
                           :: (State# RealWorld -> (# State# RealWorld, a_axD #))
                              ~R# IO a_axD) #)
                }
                })
      of
      { (# ipv_aEb, ipv1_aEc #) ->
      ipv1_aEc
      }

This can perhaps be seen even more plainly in the STG:

STG
    \r [action_sIE]
        case
            case noDuplicate# [GHC.Prim.realWorld#] of s'_sIF [Occ=Once] {
            __DEFAULT ->
            case newMutVar# [GHC.Maybe.Nothing s'_sIF] of {
            (#,#) ipv_sIH [Occ=Once] ipv1_sII ->  -- We create our IORef and bind it to ipv1_sII here
            let {
              sat_sIU [Occ=Once] :: GHC.Types.IO a_axD -- We allocate a closure for our returned IO function here, capturing our IORef
              [LclId] =
                  \r [s1_sIJ] -- Here is the state lambda
                      case readMutVar# [ipv1_sII s1_sIJ] of {
                      (#,#) ipv2_sIL [Occ=Once*] ipv3_sIM [Occ=Once!] ->
                      case ipv3_sIM of {
                        GHC.Maybe.Nothing ->
                            case action_sIE ipv2_sIL of {
                            (#,#) ipv4_sIP [Occ=Once] ipv5_sIQ ->
                            let {
                              sat_sIR [Occ=Once] :: GHC.Maybe.Maybe a_axD
                              [LclId] =
                                  CCCS GHC.Maybe.Just! [ipv5_sIQ];
                            } in
                              case
                                  writeMutVar# [ipv1_sII sat_sIR ipv4_sIP]
                              of
                              s2#_sIS [Occ=Once]
                              {
                              __DEFAULT -> (#,#) [s2#_sIS ipv5_sIQ];
                              };
                            };
                        GHC.Maybe.Just res_sIT [Occ=Once] -> (#,#) [ipv2_sIL res_sIT];
                      };
                      };
            } in  (#,#) [ipv_sIH sat_sIU];
            };
            }
        of
        {
        (#,#) _ [Occ=Dead] ipv1_sIX [Occ=Once] -> ipv1_sIX;
        };

tstat avatar Sep 18 '21 01:09 tstat