boofuzz
boofuzz copied to clipboard
String (and Bytes) primitive API is counter-intuitive
The current API of the primitives string
(and bytes
which had the API copied from it) has grown historically. Currently, a string
expects the following varaibles: value, size=-1, padding="\x00", encoding="ascii", fuzzable=True, max_len=-1, name=None
. From discussion with @jtpereyda and some other contributors, this naming scheme seems counter-intuitive and I would suggest to rewrite the API to a more intuitive naming scheme. Therefore I'd like to suggest the following, please use this issue to discuss it:
-
fuzzable
: doesn't change -
value
,encoding=ascii
: Provide the original value either as bytes or as string. If it is provided as string, it must be encodable with the provided encoding. If it is provided as bytes, it must be decodable with the provided encoding. -
strategy_maxlen=NoMaxlen|TruncateRaw|SkipRaw|TruncateEncoded|SkipEncoded
,maxlen=None
: If the strategy is changed from the default enum valueNoMaxlen
, then long testcases are either truncated or completly skipped. The cutoff is defined inmaxlen
and can either apply to the raw encoded byte length or the string character count. -
strategy_padding=NoPadding|PadRaw|PadEncoded
,minlen=None
,padchar=b'\x00'
: Similiar to above, it will pad the primitive to a minimum length, if the strategy is changed. Padding can be done with respect to the length as an encoded string or the length as raw bytes. -
thorough=True
: If set to false, the primitive will limit the number of test cases to a small amount, let's say at most 50. For instance, it won't add long strings ofB
s or other characters if you already included a few strings ofA
s and it wil only test a small list of static strings. -
strategy_extra_testcases=None
: List of byte strings or generator for extra testcases. -
strategy_1byte_replacements
: This can be given a list of single byte characters. Every byte in the (encoded) original value is replaced with every byte of this list which is what the currentbytes
primitive is currently doing. For instance, if your original value isb'BF'
and you provide['\x00', '\xFF']
as value for this value, then this will yield the testcasesb'\x00F'
,b'\xFFF'
,b'B\x00'
,b'B\xFF'
. We also would provide a sane default for this and it would accept 2 byte or 4 byte replacements, too.
I see a few discussion points or implementation hurdles with this:
- If we had this, we don't need
strings
andbytes
as seperate primitives anymore, do we? - Obviously, this would be much easier to code if we drop Python2 support as already suggested.
- @jtpereyda is currently refactoring the mutation interface, therefore this needs to wait until he is done.
- With this many options, the number of mutations is really hard to calculate in a closed formula without creating all testcases in memory first. However, this is only a challenge for the implementation and should be transparent to the user.
Please use this issue for discussion how the API should look like. I would volunteer to implement it afterwards as soon as the new mutation interface is stabilized.
Thanks for the input @keisentraut!
You do have a point with the API being a little confusing so these changes would be quite some improvement. One problem that as fas as I can tell we won't be able to keep backwards compatibility. The string primitive is likely to be the most used one, so that might be a serious breaking change.
However, we definitely need to get #422 merged first, then we can look into this.