garble icon indicating copy to clipboard operation
garble copied to clipboard

Feature Request: Word List Substitution

Open Ne0nd0g opened this issue 2 years ago • 9 comments

In order to fight against entropy, it would be useful to have Garble combine N number of words from a provided word list and use that for string replacement instead of random characters. Could also provide a max string length and trim the string at that length.

Ne0nd0g avatar Apr 25 '23 18:04 Ne0nd0g

This could surely be done, however I don't quite understand why we would do this. Can you elaborate?

lu4p avatar Apr 26 '23 00:04 lu4p

This reminds me of https://github.com/burrowers/garble/pull/593, which was designed to make it a little less trivial to detect that a binary was built with garble. I'm fine with those kind of changes in general, as long as they don't have downsides like noticeably bigger binaries.

Right now, the names get replaced by hashes, and we have enough bits that collisions are extremely unlikely, and this allows us to not need to have book-keeping in terms of how we obfuscated each name. We simply hash again as needed.

My only worry with this approach is that, with a word list, we would need to pick many words to have enough bits using the same mechanism. And since some words can be long, this could make names very long, and binaries noticeably bigger as well.

Maybe this is OK if the word list is long enough and we aggressively abbreviate some of the longer words (without causing duplicates). We'd have to experiment a bit.

mvdan avatar Apr 27 '23 06:04 mvdan

We could always add the obfuscated name book-keeping as well, and to some degree we already record what names we did not obfuscate, which is the opposite. This would allow for shorter obfuscated names, but we would need to be very careful to assign names in a deterministic order.

mvdan avatar Apr 27 '23 06:04 mvdan

very careful to assign names in a deterministic order.

My attempt to implement this is stuck on the //linkname obfuscation

pagran avatar May 05 '23 08:05 pagran

I seem to have got something usable. Here's an example of how "realistic" naming works before and after

Names generated based on scrapped identifiers: https://github.com/pagran/go-identifiers-database

pagran avatar May 12 '23 14:05 pagran

You might find https://github.com/mvdan/corpus/blob/master/top-1000.tsv useful in terms of collecting more "top" modules. Although it only scrapes github right now.

mvdan avatar May 12 '23 14:05 mvdan

After x2

1,327,195 identifiers!

pagran avatar May 12 '23 18:05 pagran

We already have two large PRs in flight. If you want us to work faster, sponsor us, particularly @pagran in this case :)

mvdan avatar Jun 13 '23 16:06 mvdan