Always return Unicode strings
I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:
>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'
The recording title, which has a "special" character in it, is a unicode object. The release title, which is all ASCII, is a str object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode objects.
Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)
This is a good idea, I can't remember if we talked about it. The fact that elementtree returns both is annoying. My preference would be for a helper method to use in the parse_element methods - either explicit or a decorator.
Moving this to apichange. Do we think that it's an incompatible change, or can we just do it?
I consider this an apichange.
People might run into problems expecting a bytestring at some point, trying to do some decoding and failing since you can't decode a (non-ascii) unicode string.
Example in mind:
$ python2
>>> "blå".decode('utf8')
u'bl\xe5'
>>> unicode("blå").decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
Something like that. Basically hurting users that tries to handle the shortcommings and using unicode everywhere in python2. I didn't check yet how isrcsubmit would handle that, but it would probably do fine since I do check if I have unicode or bytes everywhere.
OK, good reason to hold off for apichange. Thanks.