ImHex icon indicating copy to clipboard operation
ImHex copied to clipboard

[Feature] Chunk based entropy analysis (also for the pattern matcher)

Open ghost opened this issue 2 years ago • 1 comments

What feature would you like to see?

I think this would be tremendously useful for some reverse engineering tasks: chunk based entropy analysis.

For example:

  • Designate a buffer area in the pattern editor. Let's call it buf.
  • Process all buf entries. Verify if it can be aligned to a specific word size (ex. n=32 bits).
  • Run entropy analysis per chunks of N size, finding how many bits change in the chunk. This will reveal things like counters, length fields, etc. The MSBs will be the ones changing, while the rest stays consistent.
  • Repeat this process for every structure found with 'buf'.

Of course this could also be done in the data processor with some dedicated node, but I think the most useful data would come from plugging it with the existent pattern editor and matcher.

I'm more than happy to help test and provide some guidance in how to implement it.

Very general information (not my own): https://fsec404.github.io/blog/Shanon-entropy/

I would haphazard something like this for the algorithm:

  • Grab X sample.
  • Process the Y designated buffer in chunks of N bits.
  • Store X entropy analysis per chunk (chunk 0...bitlength(buffer)/N) as "previous analysis".
  • Process X+1: repeat, and this time, verify chunk by chunk what changed. This will differentiate X against X-1.
  • Mark the corresponding bits/bytes in X and X-1.
  • Present a field that shows the total bits changed per chunk (ex. chunk/word at offset I, entropy=...).
  • Optimize it for MSBs: real world scenarios, barring encryption with ivecs and proper schemes that alter the entropy of the stream, will always be biased towards MSB changes (depending on endianess/inversion, but one side of the word will exhibit the most changes).

Hopefully this makes sense, if it does not, I will try to rephrase the FR :-)

How will this feature be useful to you and others?

It will give a fairly obvious perspective of all data field changes in chunks of unknown data formats. Sequential fields will be the most obvious.

Request Type

  • [ ] I can provide a PoC for this feature or am willing to work on it myself and submit a PR

ghost avatar Jun 03 '22 13:06 ghost