mimic
mimic copied to clipboard
Added basic steganography support
This implements the steganography feature described in #28. It works as follows:
- The file to encode (specified by
--encode
) is read and converted to a bit stream - During the mimicking process, mimicked characters represent one or more bits, depending on the amount of replacement options.
- If there are 2 replacement options, one bit of information can be encoded in the character.
-
0
is represented by the first option -
1
is represented by the second - If there are 3, still only one bit can be FULLY encoded.
-
- With 4, 2 bits of data can be encoded, and so on.
-
00
is represented by the first -
10
is represented by the third - ...
-
- The number of bits that can be represented is
int(log(len(options), 2))
- There must be more than two options otherwise no bits can be encoded.
- In this case the original character is passed through
- If there are 2 replacement options, one bit of information can be encoded in the character.
- Each bit from the encode file is put into the output using this method
- The end of the data is marked by a character that is outside the normal encoding range.
- If there are 3 replacements, then the 3rd would be used as it could not be used to otherwise represent a bit. The first two options are used to represent a
0
and a1
, but the third option cannot be used to encode data. - For 6 replacements, either the 5th or the 6th could be used since either would be otherwise unused
- If there are exactly the number of replacements (2, 4, 8, ...), the original character is passed through and the next mimic attempt will include the stop character
- If there are 3 replacements, then the 3rd would be used as it could not be used to otherwise represent a bit. The first two options are used to represent a
- After all the input data has been encoded and a stop character has been inserted, the replacements go back to a random chance
This method is compatible with the --me-harder
option (and is, in fact, likely necessary in order to hide information of any substantial size.)
In addition, this change also supports mimicking files passed in with the --source
option rather than on stdin
, and the tests have been updated to use nose, so they can be run using python setup.py test
I have some concerns about this pull request, but I have to read more into it. The proper way to translate the input bit stream to the mimic 'options' is using a range encoder, which is not what's being done here. I suspect that the bit method here will not be as efficient as a range encoder.
.gitignore should probably be included in another PR