qr icon indicating copy to clipboard operation
qr copied to clipboard

Unicode Key Causes Encoding Error with Log Statement

Open jacinda opened this issue 11 years ago • 1 comments

I noticed this while using qr (which is great, btw) with Django, which uses unicode for everything and I ended up using something like q = Queue(u'my_key') without realizing it at first because my_key was a variable and not a string I had hard-coded. It also only broke if the value being popped met got pickled with non-ascii characters.

This error occurs because of the combination of using a cPickle protocol of 1 with a unicode string. There are a couple of solutions to the bug. Let me know which you prefer and I'll submit a patch.

Here is a detailed description.

Because of the way _pack is defined using protocol 1, cPickle uses a binary format for serialization:

def _pack(self, val):
    """Prepares a message to go into Redis"""
    return self.serializer.dumps(val, 1)

When a log statement is then executed on popping, if the string used for key lookup is unicode, a UnicodeDecodeError will be raised if the value of popped containing any hex values greater than 127.

log.debug('Popped ** %s ** from key ** %s **' % (popped, self.key))

Here is an example:

>>> import cPickle
>>> x = cPickle.dumps(128, 1)
>>> x
'K\x81.'
>>> u = u'unicode string'
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 11: ordinal not in range(128)

This does not fail if protocol 0 is used:

>>> x = cPickle.dumps(128)
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
u'Popped ** I129\n. ** from key ** unicode string **'

It also does not fail if the unicode string is specifically encoded as ascii:

>>> x = cPickle.dumps(128, 1)
>>> 'Popped ** %s ** from key ** %s **' % (x, u.encode('ascii'))
'Popped ** K\x80. ** from key ** unicode string **'

Either changing the pickling protocol or using explicit encoding are options and I can submit either as a patch (or do something else you suggest if both of these are considered less than ideal). Let me know what the preferred solution is.

jacinda avatar Nov 17 '13 06:11 jacinda

I'd be cool with the explicit encoding.

tnm avatar Jan 20 '14 01:01 tnm