python-soundfile icon indicating copy to clipboard operation
python-soundfile copied to clipboard

Test Unicode and bytes handling (Python 2 and 3) in all string arguments

Open mgeier opened this issue 10 years ago • 1 comments
trafficstars

After merging #119, the file argument should support str and unicode in Python 2 and str and bytes in Python 3. The arguments mode/format/subtype/endian should support str and unicode in Python 2 and only str in Python 3 (bytes should be disallowed there).

There are some facts that are especially annoying when testing this:

  • in Python2, unicode can be implicitly converted/compared to str (as long as the string consists of only ASCII characters), this is not possible for Python3's str and bytes. That means that test cases that pass in Python2 may fail in Python3.
  • file names should be tested with both Unicode and byte strings. A bytes object may also contain non-ASCII characters. All combinations of Unicode/bytes and ASCII/non-ASCII should be tested.
  • not only the success cases but also the expected failures should be tested.
  • an (invalid) file extension may contain non-ASCII characters (but should still lead to a reasonable error message
  • If local files, the actual file system encoding is unknown, it may be hard to test sys.getfilesystemencoding().
  • as always, 'RAW' files are special, so separate test cases have to be constructed for them.

mgeier avatar Mar 29 '15 19:03 mgeier

I repeat my recommendation here: Anyone who wants to know about the pitfalls of handling Unicode should watch this: http://nedbatchelder.com/text/unipain.html

mgeier avatar Mar 31 '15 09:03 mgeier