hexyl icon indicating copy to clipboard operation
hexyl copied to clipboard

Options for reading between two offsets

Open ErichDonGubler opened this issue 5 years ago • 19 comments

The -n/--length flag is a great feature -- good especially for quickly checking if file headers match something. Another use case I can think of for limited output is inspecting, say, an entire block of some data from a file that's been dumped from a disk. Let's say I'm using Linux and reading the first block of a disk somewhere to determine its contents manually:

$ hexyl -n 512 "$disk_dump"
// Some output here...

I read the output and discover that there's an MBR at the beginning, with the first partition starting at logical block address 1. Sweet, let's mosey on over to 0x200 and read another block. I could implement this by using dd:

$ input_file="_viminfo" block_size=512 block_num=1
$ dd bs="$block_size" status=none skip="$block_num" count="$block_size" if="$input_file" \
    | hexyl

...but there's two issues with this:

  • dd isn't usually available on Windows machines!
  • Because we're using stdin, we can't get the correct set of offsets -- the "file" starts at 0x0 regardless of what parameters I gave dd.

Perhaps something like this spitballed set of options might help:

$ hexyl \
    --start 512      \ # Could also be written as 
                     \ #
                     \ # Could be bikeshed to `--begin`?
                     \
    --length 512     \
                     \
                     \ # One could use an end offset instead of a length:
    # --end 1024     \ # Could also be written as `-e 0x300

Having something similar to bat's --range could also be really handy, especially when combined with relative offsets (positive and negative):

$ hexyl --range 512:+512 # same as using `--skip 512 --length 512`
$ hexyl --range=-512: # read the last block
$ hexyl --block-size 4096 -1block: # like above, but use the block unit
$ hexyl --range 32:-32 # cut out a common header and footer for the input stream we don't care about

I would be more than happy to push implementation of this, since I've great personal interest in allowing more of my reverse engineering flow in the terminal. Let me know if you want me to hack on it!

Unresolved questions

  • ~~Adding several more arguments that accept offsets/sizes might add pressure to create a system of units/radices a la xxd for the appropriate arguments. Where should the line be drawn in terms of what this project is willing to support?~~ We've added support for xb and xib with #44. That's as far as we've decided to go right now.

ErichDonGubler avatar Jan 09 '19 18:01 ErichDonGubler

Another idea is to add something to change the initial "offset" of printing -- to reuse the dd example:

$ input_file="_viminfo" block_size=512 block_num=1
$ dd bs="$block_size" status=none skip="$block_num" count="$block_size" if="$input_file" \
    | hexyl --initial-offset "$(($block_size * block_num))"

ErichDonGubler avatar Jan 09 '19 21:01 ErichDonGubler

I would be more than happy to push implementation of this, since I've great personal interest in allowing more of my reverse engineering flow in the terminal. Let me know if you want me to hack on it!

Sounds great!

  • Adding several more arguments that accept offsets/sizes might add pressure to create a system of units/radices a la xxd for the appropriate arguments. Where should the line be drawn in terms of what this project is willing to support?

Excellent question. I'd say we could afford to support the basic units and hex/decimal, if this is helpful for users.

sharkdp avatar Jan 13 '19 14:01 sharkdp

I'd say we could afford to support the basic units and hex/decimal, if this is helpful for users.

Perhaps this would be useful as a separate issue, then. I'll file it if nobody does next week. :)

ErichDonGubler avatar Jan 18 '19 22:01 ErichDonGubler

Just, for the love of Yog-Sothoth, don't pretend that 1KB means 1000 rather than 1024 bytes :p. That's an issue I feel personally about :/

kilobyte avatar Jan 18 '19 23:01 kilobyte

Placeholder filed here: https://github.com/sharkdp/hexyl/issues/44

ErichDonGubler avatar Jan 19 '19 00:01 ErichDonGubler

Not sure if anyone else would find this useful but expanding on the idea presented above I'd love to see something like:

$ hexyl --range 78..1024 somefile.bin

gavynriebau avatar Feb 14 '19 08:02 gavynriebau

@gavynriebau: As a fellow user, I wouldn't want multiple ways to do the same thing without some justification. I see some significant overlap between that range notation, what's been proposed here, and what already exists. I like the syntax, though.

@sharkdp, what do you think?

ErichDonGubler avatar Feb 14 '19 15:02 ErichDonGubler

I like that idea. In bat, we use a similar parameter (--line-range <N:M>) to select a range of lines. It uses N:M instead of N..M, which seems a little bit clearer to me if we allow missing boundaries:

--range 1024:1224  # print all bytes from 1024 to 1224
--range :128  # only print the first 128 bytes
--range 1024:  # skip the first 1024 bytes

If we decide to implement this, I think we could still keep -n/--length <N> as a fallback. It would be equivalent to --range :N.

sharkdp avatar Feb 14 '19 22:02 sharkdp

@sharkdp @gavynriebau: My only concern with the (very nice!) syntax is how to handle cases where the user basically doesn't want to math and just wants to plug in the length they know something should be -- in essence, a length instead of an absolute end position. Thinking about it, this could be satisfied with a small addition to @gavynriebau's syntax (which I actually like more than my original suggestion) is using - and + notation. This would actually have a really nice benefit of being even more close to Python's range syntax, which I assume @sharkdp's notation is inspired from:

$ hexyl --range 512:1024 # Means "start at 512th byte, read until byte 1024 (512 bytes)"
$ hexyl --range 512:+512 # Means "start at 512th byte, read 512 bytes"

$ hexyl --range +512:    # Means "start at 512th byte".
                         # The `+` has no special meaning for the start bound. It's redundant. Don't do it. Also this is a really long comment line. 

$ hexyl --range -512:    # Means "start at 512th bytes from the end, read until the end".

$ hexyl -r 2:-2          # Means "Start at from 2nd byte, read until the 2nd byte from the end".

...where, as @sharkdp previously suggested, --length N would be equivalent to --range :N.

I actually have another question now, which apply to all of the designs suggested until this point: What to do when the suggested range has bounds that go past each other or are smaller than the file size? Should it just print out 0 bytes and the largest available slice in the bounds, like Rust's range syntax would?

ErichDonGubler avatar Feb 14 '19 22:02 ErichDonGubler

Released in v0.8.0.

sharkdp avatar Jun 07 '20 13:06 sharkdp

Is --range notation still interesting? How about negative offsets? I don't see those discussed in any other issues, should we make more?

ErichDonGubler avatar Jun 10 '20 18:06 ErichDonGubler

Ok, let's just reopen this.

sharkdp avatar Jun 11 '20 13:06 sharkdp

What would negative offsets be useful for?

sharkdp avatar Jun 11 '20 13:06 sharkdp

What would negative offsets be useful for?

I left a comment here about a year ago discussing it (https://github.com/sharkdp/hexyl/issues/16#issuecomment-463831710). I don't see any validation around negative offsets specifically, but I do know they would have been useful in the reverse engineering flows I've used for proprietary data formats before.

ErichDonGubler avatar Jun 11 '20 14:06 ErichDonGubler

Ah, sorry. The negative argument for --range seems useful. I was only thinking about --display-offset

sharkdp avatar Jun 11 '20 14:06 sharkdp

I was only thinking about --display-offset

--display-offset almost certainly doesn't make sense as a "relative" offset, positive or negative. I can't see it being anything but a count. Right now we treat '+' to be the same as omitting it, but '+' and '-' will take on new meaning according to the --range design suggestion above. For cases where the semantics are still to parse a count, I don't see a reason for existing expectations to change. I've actually gotten a branch going already that implements a representation of byte offsets, including relative positive and negative, and maintains count semantics for the --display-offset and --length. This will probably help the discussion; if you want, I can make a PR. Here's the link to my branch: https://github.com/ErichDonGubler/hexyl/tree/byte-offset-alt-repr.

Eventually a negative offset has to be resolved to an absolute positive offset from start, so --display-offset can still theoretically be used with relative negative offsets. I don't think that is particularly useful (feedback welcome, everybody!), but because of this, however, negative offsets in general also will probably only work with streams that can actually have seek called on them (not emulated like we do with relative positive offsets). That's when the new error path added recently would apply. :)

EDIT: I've updated OP with more details about relative offsets and a range notation.

ErichDonGubler avatar Jun 11 '20 15:06 ErichDonGubler

Interesting feature. I don't know if it would be related to this, but I was thinking about having the opposite of -n, as in, the last n bytes from the file, instead of the first n bytes. Is this related to this or does this need a separate issue? Or is this feature already implemented and I just didn't get it?

theAkito avatar Aug 01 '20 00:08 theAkito

it is going to be implemented in #99

sharkdp avatar Aug 01 '20 21:08 sharkdp

@theAkito: That is exactly what is described in the last paragraphs of the OP, starting with "Having something similar to...". 🙂 I'm glad you're interested in that too! As @sharkdp said, I'm implementing this in #99.

ErichDonGubler avatar Aug 02 '20 17:08 ErichDonGubler

Given that we have --skip, --length, negative offsets and units of measurements... most of this ticket seems solved to me (otherwise, please correct me). Except for the convenient --range syntax. Is this still something people would like to see? If so, please open a new ticket with an up-to-date description based on a recent version of hexyl. And probably based on this comment (https://github.com/sharkdp/hexyl/issues/16#issuecomment-463831710) by @ErichDonGubler

sharkdp avatar Dec 05 '22 20:12 sharkdp