bincode
bincode copied to clipboard
[Tracking issue] Querying number of bytes needed to encode a struct
I've noticed that v2.0 doesn't have a serialized size function. I'm a rust newbie, so I'm not going to attempt a pull request, but an implementation like this would probably suffice?
use bincode::{Encode,config::Config,enc::EncoderImpl,error::EncodeError,enc::write::Writer};
/** Writer which only counts the bytes "written" to it */
struct SizeOnlyWriter<'a> {
bytes_written: &'a mut usize
}
impl<'a> Writer for SizeOnlyWriter<'a> {
fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
*self.bytes_written += bytes.len();
Ok(())
}
}
/** Return the serialized size of an `Encode` object. */
pub fn serialized_size<T:Encode,C:Config>(obj:&T, config:C) -> Result<usize, EncodeError> {
let mut size = 0usize;
let writer = SizeOnlyWriter { bytes_written: &mut size };
let mut ei = EncoderImpl::new(writer, config);
obj.encode(&mut ei)?;
Ok(size)
}
We haven't implemented serialized_size because we're not sure it has a valid use case. Most of the time when people use serialized_size it is because they want to use it like this:
let size = bincode::serialize_size(&T, &config).unwrap();
let mut vec = vec![0u8; size];
bincode::encode_into_slice(&T, vec.as_mut_slice(), config).unwrap();
We personally have found that the above is a lot slower than simply encoding to a vec, as you have to process the entire structure twice.
let mut vec = bincode::encode_to_vec(&T, config).unwrap();
But maybe there's a use case we missed?
I was thinking of using it in an FFI call, so that I could serialize into memory owned by C.
Would it be possible to:
- let bincode allocate a
Vec<u8> - call into_raw_parts
- return that to C
- ffi back into Rust to deallocate this vec when you're done
Otherwise it is possible to create your own LenWriter, implement Writer and then call encode_into_writer that simply counts the # of bytes that are written.
I'll leave this open as a tracking issue to see if other people are interested in having a method to get the encoded len in bincode 2.
Yeah, I made my own SizeOnlyWriter as shown above, to get the serialized size and tell it to C. I preferred that just in case C has somewhere very particular that it wants the data written, since either option was about equal effort.
Thanks for considering, and I'll be interested to see if anyone else wants this.
@VictorKoenders I was also looking for serialized_size in bincode. My use case is I handle deserialization in rust open a file in python (the interface done with pyo3) and don't want to read the entire file content into a byte array.
The usage I have in mind would be something along the lines of:
with open("binary.file", "rb") as f:
my_rustlib.deserialize( f.read( my_rustlib.serialized_sized() ) )
# logic that decides if more should be read, …
...
and on the backend i have - in bincode 1 - something like
#[pyfunction]
fn serialized_size() -> anyhow::Result<u64> {
let x = MyStruct::new();
Ok(bincode::serialized_size(&x)?)
}
Sure there are a number of things I could change about the approach (hard coding the size and adding a test to catch when i update the definition of MyStruct, determine the size in a build.rs to generate a constant, …) but one way or another I end up in the situation where I need to know how many bytes I need to pass to the deserializer.
(As an aside, what bincode 1 doesn't do perfectly for me either is that I don't have a dynamically sized object and would want to call bincode::serialized_size::<MyStruct>() that doesn't require instantiating the struct.
Serializing the length of a section so that it can be skipped over in decoding.
I think it should be documented that serialized_size isn't for optimisation purposes (and typically is slower) for the pattern described here https://github.com/bincode-org/bincode/issues/539#issuecomment-1094033736
This problem is causing me headaches for years. So this is my use case:
I am sending data over the network. I am sending a Header+Request. The header has to contain the serialized size of the Request+Header.
My current solution is to make all structs "#[repr(packed)]" and use std::mem::size_size_of<ExampleRequestWithHeader> to get the serialized size. However, there are some restrictions on what you can do with packed structs (https://github.com/rust-lang/rust/issues/82523).
So I would like to remove "#repr(packed)" and use bincode::serialized_size, however it requires a concrete object and I would like to avoid that. Basically what I would like to have is a method that calculates the serialized size of my struct at compile time or at least without creating the object.
This sounds somewhat useful to me as well. However I think it should be a separate trait, because for many types the serialized size is dynamic. You could have a (preferably derivable) StaticSerializedSize trait, which might depend on the encoding but not on any objects, just on the class.
On Dec 2, 2022, at 5:24 PM, Marco Boneberger @.***> wrote:
This problem is causing me headaches for years. So this is my use case:
I am sending data over the network. I am sending a Header+Request. The header has to contain the serialized size of the Request+Header.
My current solution is to make all structs "#[repr(packed)]" and use std::mem::size_size_of to get the serialized size. However, there are some restrictions on what you can do with packed structs (rust-lang/rust#82523 https://github.com/rust-lang/rust/issues/82523).
So I would like to remove "#repr(packed)" and use bincode::serialized_size, however it requires a concrete object and I would like to avoid that. Basically what I would like to have is a method that calculates the serialized size of my struct at compile time or at least without creating the object.
— Reply to this email directly, view it on GitHub https://github.com/bincode-org/bincode/issues/539#issuecomment-1335500465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACGKXFEZEZFJZ2EMLMAY3DWLIPERANCNFSM5S67C42A. You are receiving this because you authored the thread.
My use case is that I am using a file as a database and some parts of the buffer are not being used. I want to know whether the thing I need to insert into the file should go at the end or whether I can write it over one of the unused chunks in the file.
For my particular case I'd actually prefer an API that just gave me an upper bound given the type (this sounds like what @marcbone is asking for too):
match bincode::max_serialized_size::<T>(bincode_config) {
Some(max_size) => { /* serializing T will always take <= max_size */ },
None => { /* there is no upper bound */ }
}
This would be practically zero performance cost and good enough for my particular use but I think both max_serialized_size and serialized_size would be good additions with the appropriate documentation.
I'll go ahead and register my interest here, as I was looking through the bincode-2 tagged issues. I have this
struct ByteCounter {
count: usize,
}
impl Writer for ByteCounter {
fn write(&mut self, bytes: &[u8]) -> Result<(), EncodeError> {
self.count += bytes.len();
Ok(())
}
}
/// Count the bytes a value will occupy when encoded.
pub fn count_bytes<T: Encode>(x: &T) -> usize {
let mut counter = ByteCounter { count: 0 };
bincode::encode_into_writer(x, &mut counter, bincode_config()).unwrap();
counter.count
}
in a codebase of mine because I'm talking to SQLite and don't want to create a serialized copy of an object I'm working with outside of what I write into a SQLite BLOB, and SQLite requires you to set the length of a BLOB before you start writing to it. In this case, it's about memory usage, not serialization performance.
That said, it's not a big deal to me whether this specific API ends up in bincode, as I've already implemented it in user code.
(Edit: That bincode_config() function just returns the bincode configuration I use everywhere in this codebase. Nothing special there.)
I think this is already resolved with SizeWriter and encode_into_writer:
let mut size_writer = SizeWriter::default();
bincode::encode_into_writer(&t, &mut size_writer, config).unwrap();
println!("{:?}", size_writer.bytes_written
Does this work for your use case? Then I think this issue can be closed
I think this is already resolved with SizeWriter and encode_into_writer:
let mut size_writer = SizeWriter::default(); bincode::encode_into_writer(&t, &mut size_writer, config).unwrap(); println!("{:?}", size_writer.bytes_writtenDoes this work for your use case? Then I think this issue can be closed
Yup. I'm not actually sure how I missed that that exists :+1:
Thanks for testing 👍 closing this