Improve support for serializing data containing offsets (FilePtr, etc.)
Here's what roblabla had to say (on discord):
actually with seek+write, fileptr seems doable with some global context. Here's the idea: when you get a FilePtr, record the current location, write some zeroes, and schedule a sort of "late serialization" pass. When the structure is fully serialized, all scheduled late serialization pass will run sequentially. That serialization pass will just serialize the underlying structure, seek to the previously saved location, and write the proper offset
that does mean you'd need some sort of heap allocation support to keep track of the serialization passes
but I can't think of a better way.
I had a similar idea awhile back (though I wasn't looking to implement at the library level) but i don't think this'll quite work for some types of files -- specifically those which break it into different sections, and different things go into different sections. i.e.:
struct File {
info: Vec<Info>,
data: DataSection
}
struct DataSection {
// header...
// data blocks go here
}
struct InfoSection {
// header...
list: Vec<Info>,
// string blocks go here
}
struct Info {
data_name: FilePtr32<String>, // points into InfoSection
data_ptr: FilePtr32<Vec<u8>> // points into DataSection
}
under the scheme of the current idea i don't think we have a good way to represent this: we'd write out info section, info, info, info, data section, string, data, string, data, string, data instead of info section, info, info, info, string, string, string, data section, data, data, data. I think this is fixable by adding some sort of marker type Pool with some way of specifying a identifier such that you can do the same thing as the original idea, but for each pool in turn as you come across them while writing the file.
Are there any other methods we might want to consider?
I'd say we might want to go the route of file offset calculations, leading to a BinWrite trait that looks something like...
trait BinWrite {
type Args;
fn write_options<W: binrw::io::Write>(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
fn get_write_size(&self, options: &WriterOption, args: Self::Args) -> binrw::Result<u64>;
fn write_file_heap_contents(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
}
The general concept being:
- The
file_heap_posstarts at a value ofrelative_to + first_value.get_write_size()(that is typically just... immediately after the top-level BinWrite struct). So if I have a header of pointers of size 0x10, thefile_heap_posis initialized to0x10. - The
file_heap_posis updated any time a pointer is written. So in aFilePtr<T>'swrite_optionsimplementation, you would write the current value offile_heap_posand then increment it byinner.get_write_size(). - After the
write_optionspass on the top-level struct,write_file_heap_contentsis called on it (and then it recursively calls it on everything else), thus writing the file contents in the same order as before.
Thoughts?
Some known drawbacks:
- this doesn't allow much room for deciding the layout of how things are allocated without a manually binwrite implementation. Not sure if there's really a great way to handle that though?
- this should allow for alignment of file pointers, but we need to figure out the interface for that
fwiw my current take on this is that we should probably not provide an implementation of BinWrite for FilePtr for now and experiment out-of-tree for a bit. File formats are going to have different requirements for how things are positioned, and I don't think we have enough information at the moment to properly cover every file format.
what i think we should do instead is focus on having tools available such that people can implement a serialization scheme for their file pointers on top using a newtype, (or, heck, i guess serialize_with could be a thing) in a somewhat ergonomic fashion. To me, this means exposing some sort of user-expandable ReadOptions type thing that the user can stick some sort of mutable state (through rc?) in and perhaps implement the scheme that jam suggested in previous message, or some other scheme. Or maybe we do something else, idk.
once we experiment enough out of tree we can revisit actually including something in binrw proper. does that make sense?