boofuzz String (and Bytes) primitive API is counter-intuitive

String (and Bytes) primitive API is counter-intuitive

Open keisentraut opened this issue 4 years ago • 1 comments

The current API of the primitives string (and bytes which had the API copied from it) has grown historically. Currently, a string expects the following varaibles: value, size=-1, padding="\x00", encoding="ascii", fuzzable=True, max_len=-1, name=None. From discussion with @jtpereyda and some other contributors, this naming scheme seems counter-intuitive and I would suggest to rewrite the API to a more intuitive naming scheme. Therefore I'd like to suggest the following, please use this issue to discuss it:

fuzzable: doesn't change
value, encoding=ascii: Provide the original value either as bytes or as string. If it is provided as string, it must be encodable with the provided encoding. If it is provided as bytes, it must be decodable with the provided encoding.
strategy_maxlen=NoMaxlen|TruncateRaw|SkipRaw|TruncateEncoded|SkipEncoded, maxlen=None: If the strategy is changed from the default enum value NoMaxlen, then long testcases are either truncated or completly skipped. The cutoff is defined in maxlenand can either apply to the raw encoded byte length or the string character count.
strategy_padding=NoPadding|PadRaw|PadEncoded, minlen=None, padchar=b'\x00': Similiar to above, it will pad the primitive to a minimum length, if the strategy is changed. Padding can be done with respect to the length as an encoded string or the length as raw bytes.
thorough=True: If set to false, the primitive will limit the number of test cases to a small amount, let's say at most 50. For instance, it won't add long strings of Bs or other characters if you already included a few strings of As and it wil only test a small list of static strings.
strategy_extra_testcases=None: List of byte strings or generator for extra testcases.
strategy_1byte_replacements: This can be given a list of single byte characters. Every byte in the (encoded) original value is replaced with every byte of this list which is what the current bytes primitive is currently doing. For instance, if your original value is b'BF' and you provide ['\x00', '\xFF'] as value for this value, then this will yield the testcases b'\x00F', b'\xFFF', b'B\x00', b'B\xFF'. We also would provide a sane default for this and it would accept 2 byte or 4 byte replacements, too.

I see a few discussion points or implementation hurdles with this:

If we had this, we don't need stringsand bytesas seperate primitives anymore, do we?
Obviously, this would be much easier to code if we drop Python2 support as already suggested.
@jtpereyda is currently refactoring the mutation interface, therefore this needs to wait until he is done.
With this many options, the number of mutations is really hard to calculate in a closed formula without creating all testcases in memory first. However, this is only a challenge for the implementation and should be transparent to the user.

Please use this issue for discussion how the API should look like. I would volunteer to implement it afterwards as soon as the new mutation interface is stabilized.

Apr 28 '20 08:04 keisentraut

Thanks for the input @keisentraut!

You do have a point with the API being a little confusing so these changes would be quite some improvement. One problem that as fas as I can tell we won't be able to keep backwards compatibility. The string primitive is likely to be the most used one, so that might be a serious breaking change.

However, we definitely need to get #422 merged first, then we can look into this.

May 13 '20 20:05 SR4ven

boofuzz boofuzz copied to clipboard

String (and Bytes) primitive API is counter-intuitive

boofuzz
boofuzz copied to clipboard