cborg
cborg copied to clipboard
Decode bytes of known length
I couldn't see an easy way to decode a ByteString with a length which is known ahead of time, which could lead to potential security problems down the road.
The example came up in #haskell where we were decoding a hash digest, whose length is known statically, but currently you have to use decodeBytes, which a malicious actor could use to send us terabytes of data before we get a chance to inspect its length. It would be good to have a decodeBytesLen (either :: Int -> Decoder s ByteString or :: Decoder s Int with decodeBytesWithLen :: Int -> Decoder s BytesString).
This is mostly for discussion, since it looks like it might require some low level changes in cborg and I'm not sure of the best way to handle them.
Edit: this is likely to also apply to text strings but I'm not sure it's as useful given the representation only telling you the length in bytes.
a malicious actor could use to send us terabytes of data before we get a chance to inspect its length.
It's worth noting that the overall length of input can be checked by the code that pushed data into the decoder, so if you can place overall limits on the amount of data you expect then you can prevent the "terabytes of data" problem.
So this is relevant only for cases where you want to limit something locally, within the overall input size limits.
Perhaps what we want here is a peekBytesLen/peekStringLen primitive.
Specifically, the peekBytesLen would not consume the token header. You then get to choose if you want to use the normal decodeBytes or want to fail.
Combining those into on operation (checking what, exact length?) might be sensible if the benchmarks show it's worth it, for common-enough types.
peekBytesLen seems like a good way forward, it would certainly solve the problem I was considering. peekStringLen would also be useful (and probably also peekListLen giving Maybe Int), for those cases where the uses knows there are some extra constraints on what is valid.
For list len, you can just consume the list len and then fail, no? It's only bytes/string len where currently there's no way to get the size at all, without getting the whole thing.