python-musicbrainzngs icon indicating copy to clipboard operation
python-musicbrainzngs copied to clipboard

Always return Unicode strings

Open sampsyo opened this issue 12 years ago • 5 comments

I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:

>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'

The recording title, which has a "special" character in it, is a unicode object. The release title, which is all ASCII, is a str object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode objects.

Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)

sampsyo avatar Jan 29 '13 19:01 sampsyo

This is a good idea, I can't remember if we talked about it. The fact that elementtree returns both is annoying. My preference would be for a helper method to use in the parse_element methods - either explicit or a decorator.

alastair avatar Mar 11 '13 21:03 alastair

Moving this to apichange. Do we think that it's an incompatible change, or can we just do it?

alastair avatar Feb 06 '14 15:02 alastair

I consider this an apichange.

People might run into problems expecting a bytestring at some point, trying to do some decoding and failing since you can't decode a (non-ascii) unicode string.

Example in mind:

$ python2
>>> "blå".decode('utf8')
u'bl\xe5'
>>> unicode("blå").decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Something like that. Basically hurting users that tries to handle the shortcommings and using unicode everywhere in python2. I didn't check yet how isrcsubmit would handle that, but it would probably do fine since I do check if I have unicode or bytes everywhere.

JonnyJD avatar Feb 06 '14 15:02 JonnyJD

OK, good reason to hold off for apichange. Thanks.

alastair avatar Feb 06 '14 15:02 alastair