unicode-security-guide icon indicating copy to clipboard operation
unicode-security-guide copied to clipboard

Best-Fit Mappings: Test core ICU string encoding APIs

Open cweb opened this issue 11 years ago • 0 comments

See: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit

Identify ICU's core string encoding APIs, and test major ICU versions to document:

  • best-fit mapping behavior - does the API best-fit characters by default?
  • override options - can default be overridden?

One way to test this might be to brute force a large set of Unicode characters by converting them to a target encoding and seeing if they convert to anything 128-bit ASCII.

// Loop through all available encodings
for each available encoding {
  // Loop through first 65,535 code points, starting at 0x80 to avoid 
  // using 128-bit ASCII as the source, because we want to test
  // if ASCII is the outcome!
  for each Unicode character 0x080 to 0xffff {
    convert the Unicode character from UTF-8 or UTF-16 to the target encoding (e.g. shift_jis, ISO-8859-1, etc)
    test if the target character is ASCII 0x00 to 0x80 after the conversion
  }
}

cweb avatar Aug 02 '13 05:08 cweb