bulk_extractor-rec icon indicating copy to clipboard operation
bulk_extractor-rec copied to clipboard

utmp comment is wrong

Open MagicalTux opened this issue 3 years ago • 7 comments

The line below is wrong:

https://github.com/4n6ist/bulk_extractor-rec/blob/1964395fef0efc2d183beffd4a4d4642ef5eb91e/src/scan_utmp.cpp#L31

ut_type is indeed 16bits, however the next value pid_t ut_pid needs to be aligned, so gcc will insert 2 empty bytes because the value isn't packed.

This is not an issue as is on little endian systems (02 00 00 00 stays the same whether the value is 32bits or 16bits followed by padding), but this will fail on big endian systems as value will appear at 00 02 00 00 in the file and read as 0x20000 and considered invalid (I don't know if you support big endian at all, I do and while looking for anyone who may have done a carver for utmp I found your code, but unfortunately I won't be able to use it, but I thought I'd still leave a comment on that so you know why it's defined as a short int but appears to take 4 bytes).

MagicalTux avatar Jun 20 '22 05:06 MagicalTux

Hi @MagicalTux . I moved all of the updates from bulk_extractor-rec into the original [bulk_extractor](https://github.com/simsong/bulk_extractor/) release several years ago.

The revised code now used the byte-order independent sbuf.get32i() method, rather than relying on gcc alignment.. Can you look at the code? If the value is indeed 16 bits and the next value is in fact ut_pid, we can recover that as well:

https://github.com/simsong/bulk_extractor/blob/d0505fb0fbc4cee08cbab1ed0d8f2802ead4b0f1/src/scan_utmp.cpp#L30

bulk_extractor is supposed to run on both big-endian and little-endian systems. Can you try it out?

Also, if you have some test vectors, I'm happy to fix them.

Meanwhile, I've updated my code to reflect your comment. Can you look over the PR?

Also, if you have a big-endian system to test on, please let me know!

Thanks,

Simson

simsong avatar Jun 20 '22 10:06 simsong

Oh, @MagicalTux - I also need some test vectors. If you can get me some, I can include them in the unit-tests. Thanks again.

simsong avatar Jun 20 '22 10:06 simsong

Test file attached, it's a wtmp file generated on Debian running on a SPARC CPU.

You'll see it starts with 00 07 00 00

wtmp.zip

MagicalTux avatar Jun 20 '22 15:06 MagicalTux

I haven't found the source for get16i or get32i but I'm guessing it's using the endianness based on a setting or something. I'm guessing it should be fixed with https://github.com/simsong/bulk_extractor/pull/362 however.

MagicalTux avatar Jun 20 '22 15:06 MagicalTux

Here is the source: https://github.com/simsong/be20_api/blob/main/sbuf.h

condensed:

    uint16_t get16u_unsafe(size_t i) const {
        return (uint16_t)(this->buf[i + 0] <<  0) | (uint16_t)(this->buf[i + 1] << 8);
    }


    uint16_t get16u(size_t i) const {
        if (i + 2 > bufsize) throw sbuf_t::range_exception_t(i, 2);
        return get16u_unsafe(i);
    }


    int16_t get16i(size_t i)   const { return get16u(i);}

Endianness is most definitely not "based on some setting or something." I've had problems with those things. It's based on the underlying definition of what it means to be a 2-byte representation of a 16-bit number.

simsong avatar Jun 20 '22 17:06 simsong

Thanks for the text file.

simsong avatar Jun 20 '22 17:06 simsong

@MagicalTux - I'm moving this discussion to here: https://github.com/simsong/bulk_extractor/issues/363

simsong avatar Jun 20 '22 17:06 simsong