Feature Request: Word List Substitution
In order to fight against entropy, it would be useful to have Garble combine N number of words from a provided word list and use that for string replacement instead of random characters. Could also provide a max string length and trim the string at that length.
This could surely be done, however I don't quite understand why we would do this. Can you elaborate?
This reminds me of https://github.com/burrowers/garble/pull/593, which was designed to make it a little less trivial to detect that a binary was built with garble. I'm fine with those kind of changes in general, as long as they don't have downsides like noticeably bigger binaries.
Right now, the names get replaced by hashes, and we have enough bits that collisions are extremely unlikely, and this allows us to not need to have book-keeping in terms of how we obfuscated each name. We simply hash again as needed.
My only worry with this approach is that, with a word list, we would need to pick many words to have enough bits using the same mechanism. And since some words can be long, this could make names very long, and binaries noticeably bigger as well.
Maybe this is OK if the word list is long enough and we aggressively abbreviate some of the longer words (without causing duplicates). We'd have to experiment a bit.
We could always add the obfuscated name book-keeping as well, and to some degree we already record what names we did not obfuscate, which is the opposite. This would allow for shorter obfuscated names, but we would need to be very careful to assign names in a deterministic order.
very careful to assign names in a deterministic order.
My attempt to implement this is stuck on the //linkname obfuscation
I seem to have got something usable. Here's an example of how "realistic" naming works before and after
Names generated based on scrapped identifiers: https://github.com/pagran/go-identifiers-database
You might find https://github.com/mvdan/corpus/blob/master/top-1000.tsv useful in terms of collecting more "top" modules. Although it only scrapes github right now.
We already have two large PRs in flight. If you want us to work faster, sponsor us, particularly @pagran in this case :)