Be aware of encodings different to ASCII

Open leandro-lucarella-sociomantic opened this issue 12 years ago • 2 comments

#45 exposed one problem when handling text encoded with something different from ASCII. The problem goes much deeper than that, and to properly support any encoding across the whole program, every usage of a str/unicode object must be revised.

Oct 17 '13 16:10 leandro-lucarella-sociomantic

Having done this a bit before, you likely want to start by adding:

from __future__ import unicode_literals

which will make all literals be unicode without having to put a u in front of them. Then you just have to fix the places where you actually want to be messing with bytes directly.

For more future compatibility with python 3 you probably also want to add:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

And then fix all the bugs :)

http://stackoverflow.com/questions/5937251/writing-python-2-7-code-that-is-as-close-to-python-3-x-syntax-as-possible has other tips if you run into particular issues.

Jul 23 '14 16:07 pjz

Thanks for the tips, I know the current unicode situation is horrible. Eventually we will need to take care of that, in Python 2.x is very hard to have "unicode-correctness", at least in my experience it was always a mess. Luckily Python 3.x took care of that :)

Jul 23 '14 17:07 leandro-lucarella-sociomantic