binread icon indicating copy to clipboard operation
binread copied to clipboard

switch endianness based on field for the remaining parsing

Open goller opened this issue 4 years ago • 2 comments

Hey @jam1garner , binread is really awesome! I'm having a great time with it :))

I have a file format in which a particular field's bytes is used to interpret the endianness of the entire file; I'm using the is_big attribute, but, I'm using it for each of the fields. Is there a way to switch the rest of the parse in its entirety ?

#[derive(BinRead, Debug, PartialEq)]
#[br(big)]
pub struct Header {
    endian: u16,
    #[br(is_big = (endian == 0x1234))]
    i: u32,
    #[br(is_big = (endian == 0x1234))]
    j: u64,
    #[br(is_big = (endian == 0x1234))]
    f1: f32,
    #[br(is_big = (endian == 0x1234))]
    f2: f64,
}

goller avatar May 30 '20 01:05 goller

I'm not sure if this is a good solution for you but using an embedded struct would work for this, albeit would make member-accessing a little harder.

#[derive(BinRead, Debug, PartialEq)]
#[br(big)]
pub struct Header {
    endian: u16,
    #[br(is_big = (endian == 0x1234))]
    inner: HeaderInner
}

pub struct HeaderInner {
    i: u32,
    j: u64,
    f1: f32,
    f2: f64,
}

If not, could you give me a bit of info as to your use case to I can better understand how this should be handled?

jam1garner avatar May 30 '20 01:05 jam1garner

I also need this quite often.

If a field is used to interpret endianness, it is generally called Byte Order Mark or in short, bom. In UTF-16 0xFEFF is used for big-endian and 0xFFEF is used for little-endian. I am wondering a little, that you did not yet come across it, because Nintendo actually uses a BOM quite often as well. For example SARC and BFRES have it.

I would like to see a new attribute called bom, that can be placed on a field so that it gets interpreted as BOM and sets the endianness for the rest of the struct. Additionally, you should have the possibility to give it a number/hex string, which gives it a hint how the field should look like for BE and LE respectively. If the field is an u16, it should use 0xFEFF and 0xFFFE as default.

Example:

#[derive(BinRead, Debug, PartialEq)]
pub struct Header {
    #[bom] // with default interpretation
    bom: u16,
    i: u32,
    j: u64,
    f1: f32,
    f2: f64,
}

// or with custom bom value
#[derive(BinRead, Debug, PartialEq)]
pub struct Header {
    #[bom(big = 0x1234, little = 0x4321)]
    bom: u16,
    i: u32,
    j: u64,
    f1: f32,
    f2: f64,
}

Tarnadas avatar Oct 21 '20 05:10 Tarnadas