unicode-security-guide
unicode-security-guide copied to clipboard
Best-Fit Mappings: Test core Python string encoding APIs
See: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit
Identify Python core string encoding APIs, and test major Python versions to document:
- best-fit mapping behavior - does the API best-fit characters by default?
- override options - can default be overridden?
One way to test this might be to brute force a large set of Unicode characters by converting them to a target encoding and seeing if they convert to anything 128-bit ASCII.
// Loop through all available encodings
for each available encoding {
// Loop through first 65,535 code points, starting at 0x80 to avoid
// using 128-bit ASCII as the source, because we want to test
// if ASCII is the outcome!
for each Unicode character 0x080 to 0xffff {
convert the Unicode character from UTF-8 or UTF-16 to the target encoding (e.g. shift_jis, ISO-8859-1, etc)
test if the target character is ASCII 0x00 to 0x80 after the conversion
}
}