haskell-capnp icon indicating copy to clipboard operation
haskell-capnp copied to clipboard

Allow lazy decoding into high-level types.

Open zenhack opened this issue 7 years ago • 0 comments

This has been in the back of my mind for a while, but I apparently never got around to recording it:

One of the benefits of the wire format is that it makes random access to a message feasible. Right now the high-level API does not allow making use of this, since we have to decode the entire message up front. I think this is still the right default -- until perf is manifestly important enough, do the robust thing.

But it should also be possible to take advantage of laziness to only do decoding on-demand, without any substantial change to the API. We could implement a monad that satisfies ReadCtx m ConstMsg, with the following properties:

  1. The MonadThrow instance would have throwM = pure . throw; this would have the effect of retaining laziness even with error checking -- exceptions would occur on evaluation of the problematic part of the message.
  2. >>= would have to be strict in its left argument, in order to satisfy the MonadThrow law that throwM e >>= f = throwM e.
  3. fmap, (<*>) could just be trivial function wrappers around function application. Importantly, (<*>) would be lazy in both arguments, allowing non-strict traversal of a message. The generated Decerialize instances were written with this in mind, so they use <*> instead of >>= wherever they can, and so don't need to change to accommodate lazy decoding.

MonadLimit's invoice is the hard part. There are a couple different strategies I can think of:

  • Have the monad encapsulate an IORef WordCount, and use unsafePerformIO/atomicallyModifyIORef to manage the traversal limit. We'd need to do the management atomically, because the type signature would be pure, and thus would be a huge footgun for it not to be thread-safe.
  • Have invoice be a no-op, and mitigate the DoS risk by using timeouts or allocation limits.

The latter strategy could be made less error-prone by having our new monad (let's call it LazyQuery) be run via a function like:

-- | @'queryLazy' timeout raw query@ decodes @raw@ and passes it to @query@, evaluating
-- the result to normal form. @timeout@ is used to set a timeout; if the timeout is exceeded a 
-- @TraversalLimitError@ will be thrown. Any other exception thrown by `query` will be re-thrown
-- in `IO`.
queryLazy :: (Decerialize a, NFData b) => Int -> Cerial a ConstMsg -> (a -> LazyQuery b) -> IO b

...which would make it hard to forget to set a timeout. Note that it would still be possible to shoot yourself in the foot, as b could contain parts of the message that are still unevaluated, if e.g. they are captured as part of a lambda (or otherwise not fully evaluated, for dubious instances of NFData).

zenhack avatar Nov 08 '18 20:11 zenhack