ImHex icon indicating copy to clipboard operation
ImHex copied to clipboard

Feature request: statistics about the values of fields in a pattern struct

Open HoldYourWaffle opened this issue 3 years ago • 0 comments

When reverse engineering formats I often look at the distribution of values for a certain struct. An example from the world of romhacking:

  • I know that this file contains the data on all players in the game.
  • There's a clear structure in this file, some values seem to repeat every 64 bytes.
  • I'll assume that the first 4 bytes in each block represents the "player ID".
  • I check the amount of distinct values this group has across all blocks, and it happens to be the exact same amount as the amount of players in the game, sounds like a match to me.
  • Next I'll look for gender information. Using this "statistics tool" and some trial-and-error I notice that the fourth byte only has 2 distinct values, which is very suspicious to say the least.
  • Repeat this process until all groups of bytes have been identified.

I have successfully applied this strategy to a lot of formats using home-mode scripts, but I think it'd be amazing if this could be built into ImHex (saves me the effort of writing a parser for my structs every time...). I've mainly used these statistics in my research:

  • Amount of distinct values.
  • Distribution of values (how often each value appeared).
  • The distribution of [the distribution of values] (how often did counts of values appear). This one is probably a lot more niche, I used it to figure out things as "group ID". I knew that there were (for example) 2 groups with 10 members, 5 groups with 8 members, 1 group with 3 members, etc. Then I found out that in a certain block of bytes, 2 values occur 10 times, 5 values occur 8 times, 1 value occurs 3 times, etc. Using this "distribution of distributions" I could quickly conclude that this block must represent the "group ID", which would've been much more painful without it.

These are just the stats I used in my research, I can imagine that there might be other useful statistics as well. Perhaps it'd be smart to expose some kind of "easy" API to expand the calculation of statistics.

HoldYourWaffle avatar Mar 18 '21 13:03 HoldYourWaffle