purescript-protobuf icon indicating copy to clipboard operation
purescript-protobuf copied to clipboard

encodeVarint64 performance improvement preallocate ArrayBuffer

Open jamesdbrock opened this issue 4 years ago • 3 comments

In encodeVarint64, instead of creating a separate ArrayBuffer for each byte, we could preallocate an ArrayBuffer with the right number of bytes for the entire varint by examining its value. And the same for encodeVarint32.

https://github.com/xc-jp/purescript-protobuf/blob/b4ddb67b72b4a3db8b5e5f6ec046182a802e18aa/src/Protobuf/Encode.purs#L311-L321

jamesdbrock avatar Jun 24 '21 00:06 jamesdbrock

case n of
  _ | n < 128 -> do
    Builder.putUint8 n
  _ | n < 16384 -> do
    buf <- ArrayBuffer.empty 2
    dv <- DataView.whole buf
    DataView.setInt8 dv 0 $ contGroup $ n `and` u0x7F
    DataView.setInt8 dv 1 $ (n `zsh` 7) `and` u0x7F
    Builder.putArrayBuffer buf
  _ | n < 2097152 -> do

jamesdbrock avatar Jun 24 '21 00:06 jamesdbrock

Just for fun

What if we had a set of builder primitives which returned

  • The byte size required to write the primitive
  • A function which takes an offset to an ArrayBuffer and writes the primitive.
type Putter :: Tuple ByteSize (ByteOffset -> ArrayBuffer -> Effect Unit)
putInt8 :: Int -> Putter
putStringUTF8 :: String -> Putter
putFloat64 :: Number -> Putter

Then we could build an ArrayBuffer in two passes, without ever allocating any temporary ArrayBuffers:

  1. Collect the Putters, add their sizes, allocate the final ArrayBuffer.
  2. Call the writer function of each Putter to fill the ArrayBuffer.

Of course we would have to allocate all of these temporary Putters, which is certainly more expensive than allocating the temporary ArrayBuffers.

Maybe there is a way to carry the Putter in a monad and have two different ways to exec the monad, one to calculate ByteSize and one to write to an ArrayBuffer?

Or something like this monoidal append?

<> :: Putter -> Putter -> Putter
Tuple sl wl <> Tuple sr wr = Tuple (sl + sr) (\o ab -> wl o ab *> wr (o + sl) ab)

The thing is that, in cases like encoding a UTF8 string, if you want to know the required size of the encoded string, you have to walk the whole string and do the encoding in order to calculate the size. Won't it always be faster to just write the encoding to a temporary buffer while you're at it, and then copy that temporary buffer later?

I guess in the case of encoding a UTF8 string we could encode the temporary buffer and enclose it with the writer function which, when called, just memcpys the temporary buffer. I'm not saying that's a good idea. But it's a thing we could do.

jamesdbrock avatar Jun 24 '21 01:06 jamesdbrock

Just for fun

What we want to do is read a ByteSize in context r and write c in context w, kind of like this:

https://hackage.haskell.org/package/codec-0.2.1/docs/Control-Monad-Codec.html

jamesdbrock avatar Jun 24 '21 01:06 jamesdbrock