encodeVarint64 performance improvement preallocate ArrayBuffer
In encodeVarint64, instead of creating a separate ArrayBuffer for each byte, we could preallocate an ArrayBuffer with the right number of bytes for the entire varint by examining its value. And the same for encodeVarint32.
https://github.com/xc-jp/purescript-protobuf/blob/b4ddb67b72b4a3db8b5e5f6ec046182a802e18aa/src/Protobuf/Encode.purs#L311-L321
case n of
_ | n < 128 -> do
Builder.putUint8 n
_ | n < 16384 -> do
buf <- ArrayBuffer.empty 2
dv <- DataView.whole buf
DataView.setInt8 dv 0 $ contGroup $ n `and` u0x7F
DataView.setInt8 dv 1 $ (n `zsh` 7) `and` u0x7F
Builder.putArrayBuffer buf
_ | n < 2097152 -> do
Just for fun
What if we had a set of builder primitives which returned
- The byte size required to write the primitive
- A function which takes an offset to an
ArrayBufferand writes the primitive.
type Putter :: Tuple ByteSize (ByteOffset -> ArrayBuffer -> Effect Unit)
putInt8 :: Int -> Putter
putStringUTF8 :: String -> Putter
putFloat64 :: Number -> Putter
Then we could build an ArrayBuffer in two passes, without ever allocating any temporary ArrayBuffers:
- Collect the
Putters, add their sizes, allocate the finalArrayBuffer. - Call the writer function of each
Putterto fill theArrayBuffer.
Of course we would have to allocate all of these temporary Putters, which is certainly more expensive than allocating the temporary ArrayBuffers.
Maybe there is a way to carry the Putter in a monad and have two different ways to exec the monad, one to calculate ByteSize and one to write to an ArrayBuffer?
Or something like this monoidal append?
<> :: Putter -> Putter -> Putter
Tuple sl wl <> Tuple sr wr = Tuple (sl + sr) (\o ab -> wl o ab *> wr (o + sl) ab)
The thing is that, in cases like encoding a UTF8 string, if you want to know the required size of the encoded string, you have to walk the whole string and do the encoding in order to calculate the size. Won't it always be faster to just write the encoding to a temporary buffer while you're at it, and then copy that temporary buffer later?
I guess in the case of encoding a UTF8 string we could encode the temporary buffer and enclose it with the writer function which, when called, just memcpys the temporary buffer. I'm not saying that's a good idea. But it's a thing we could do.
Just for fun
What we want to do is read a ByteSize in context r and write c in context w, kind of like this:
https://hackage.haskell.org/package/codec-0.2.1/docs/Control-Monad-Codec.html