haskell-capnp
haskell-capnp copied to clipboard
Allow lazy decoding into high-level types.
This has been in the back of my mind for a while, but I apparently never got around to recording it:
One of the benefits of the wire format is that it makes random access to a message feasible. Right now the high-level API does not allow making use of this, since we have to decode the entire message up front. I think this is still the right default -- until perf is manifestly important enough, do the robust thing.
But it should also be possible to take advantage of laziness to only do decoding on-demand, without any substantial change to the API. We could implement a monad that satisfies ReadCtx m ConstMsg, with the following properties:
- The MonadThrow instance would have
throwM = pure . throw; this would have the effect of retaining laziness even with error checking -- exceptions would occur on evaluation of the problematic part of the message. >>=would have to be strict in its left argument, in order to satisfy theMonadThrowlaw thatthrowM e >>= f = throwM e.fmap,(<*>)could just be trivial function wrappers around function application. Importantly,(<*>) would be lazy in both arguments, allowing non-strict traversal of a message. The generatedDecerializeinstances were written with this in mind, so they use<*>instead of>>=wherever they can, and so don't need to change to accommodate lazy decoding.
MonadLimit's invoice is the hard part. There are a couple different strategies I can think of:
- Have the monad encapsulate an
IORef WordCount, and useunsafePerformIO/atomicallyModifyIORefto manage the traversal limit. We'd need to do the management atomically, because the type signature would be pure, and thus would be a huge footgun for it not to be thread-safe. - Have invoice be a no-op, and mitigate the DoS risk by using timeouts or allocation limits.
The latter strategy could be made less error-prone by having our new monad (let's call it LazyQuery) be run via a function like:
-- | @'queryLazy' timeout raw query@ decodes @raw@ and passes it to @query@, evaluating
-- the result to normal form. @timeout@ is used to set a timeout; if the timeout is exceeded a
-- @TraversalLimitError@ will be thrown. Any other exception thrown by `query` will be re-thrown
-- in `IO`.
queryLazy :: (Decerialize a, NFData b) => Int -> Cerial a ConstMsg -> (a -> LazyQuery b) -> IO b
...which would make it hard to forget to set a timeout. Note that it would still be possible to shoot yourself in the foot, as b could contain parts of the message that are still unevaluated, if e.g. they are captured as part of a lambda (or otherwise not fully evaluated, for dubious instances of NFData).