redis-py
redis-py copied to clipboard
Differences in parsing with decode_responses=True, with and without hiredis
I ran into a similar problem as #809: I have a redis connection using decode_repsonses=True, but I am also using a 3rd-party package (RQ) that pickles some data and deep in that library I get UnicodeDecodeErrors. I'm kind of stuck because of that library's use of pickled data.
However, I thought to install hiredis-py to see if it handled parsing differently and it does.
So, with this:
import redis
import pickle
rdb = redis.StrictRedis(decode_responses=True)
data = pickle.dumps(dict(hello='world'))
rdb.set('foo', data)
rdb.get('foo')
With hiredis installed I see:
b'\x80\x03}q\x00X\x05\x00\x00\x00helloq\x01X\x05\x00\x00\x00worldq\x02s.'
Without hiredis installed, I get:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Looking at hiredis-py code, I see the equivalent of this:
if encoding:
try:
data = data.decode(encoding)
except ValueError:
pass
I'm wishing redis-py did the same. I'd love to see the following change to connection.py:
def decode(self, value, force=False):
"Return a unicode string from the byte representation"
if (self.decode_responses or force) and isinstance(value, bytes):
try:
value = value.decode(self.encoding, self.encoding_errors)
except ValueError:
# Ignore encoding and return bytes
pass
return value
Considering using hiredis comes highly recommended, I'd love to see this base difference in parsing resolved.
Thanks!
same problem here
Related: #772
This could be a good PR will look at it
Ya, this is definitely a problem. In my opinion, it's more "correct" to raise an error in this case. We're asking for a decoded response in a specific encoding. Getting back something different is unexpected and should never happen.
I believe hiredis should be changed so that a decoding error raises a ValueError instead of returning the raw bytes. I've added an issue to hiredis-py here: redis/hiredis-py#81.
@dpopowich In your specified case, I believe you should have a separate Redis connection that sets decode_responses=False and use that connection when interacting with pickled values (which are bytes, not strings).
Additionally, I'm currently working on a new feature that will allow a single connection to return decoded strings or bytes based on using a context manager. It should be included in the next version which I hope to release in early January.
@andymccurdy , I have the same problem here, in my opinion, using decode_responses=False in one connection may be too coarse-grained, why can't i using decode_responses=False in one method just like redis.get(key, decode_responses=False), as for a specific key, i want the value is bytes, other key are strs, but i only want to have a single redis client instance globally.
That's pretty much what you'll be able to do with the context manager I'm working on. It's fairly complicated because of how pipelines work and the hiredis-py integration. Something like:
r = redis.Redis(..., decode_responses=True)
with r.responses_as_bytes:
r.get('my-key') # will return the value as bytes
I haven't settled on the context manager name yet.
That would be cool!
That's pretty much what you'll be able to do with the context manager I'm working on. It's fairly complicated because of how pipelines work and the hiredis-py integration. Something like:
r = redis.Redis(..., decode_responses=True) with r.responses_as_bytes: r.get('my-key') # will return the value as bytes
Yes, this is cool, but I don't know how it's going to solve my problem where I'm using a 3rd party tool that stores both my data (str) and it's own meta data (pickled data, bytes). I can't have two connections for this, nor can I use this context manager without forking my own copy of the 3rd party project.
Will I be able to work around it? Yes, at some cost, but meanwhile, I still think "please decode what you can, but don't blow up my whole application with an exception if you can't" is a legitimate, robust solution:
try:
value = value.decode(self.encoding, self.encoding_errors)
except ValueError:
# Ignore encoding and return bytes
pass
@dpopowich Your suggestion might work out some of the time on Python 2 but it will be break many things on Python 3. Many stdlib functions specifically require bytes arguments. Others specifically require string objects. They aren't interchangeable. Not being able to reason about whether a return type is bytes or a string is a big problem.
If you can only have one connection, consider turning off decode_response. You can manually .decode() responses you need as strings within your part of the code base.
Once the context manager is complete, libs that use redis-py such as RQ can add a one-line context manager in places where they require binary data.
The proposed change (redis/hiredis-py#82) to hiredis-py has been accepted and merged. Future versions of hiredis-py will behave sanely and no longer silently discard decoding errors by default.
@andymccurdy Stumbled upon this thread when investigating my own bug with decoding responses with binary data. Is the addition of a r.responses_as_bytes context manager still on the roadmap or have you switched approaches?
I tried to encode and save a string with Spanish letters, Redis would save it and I am able to view it in redis-cli but when using the same logic in my python program where I do a 'hscan', this returns null. Is this issue related to this or am I in the wrong thread?
@piercefreeman That's still what I would like to do. I just haven't had time to implement it.
@harishkumarr95 I think you're in the wrong thread. Can you open up a new issue and provide a code sample that fails?
Related to https://github.com/Grokzen/redis-py-cluster/discussions/447 @andymccurdy comment as you see fit.
If future versions of redis-py will not support decode_responses=False please advise, thanks.
This issue is marked stale. It will be closed in 30 days if it is not updated.
bad bot
@andymccurdy Any update on this? In the meantime I figured out a workaround:
import redis
r = redis.Redis(decode_responses=True)
try:
data = r.get("some_binary_data")
except UnicodeDecodeError as e:
data = e.object
Python documentation: https://docs.python.org/3/library/exceptions.html#UnicodeError.object
This issue is marked stale. It will be closed in 30 days if it is not updated.
In case you want to get more than one key at a time, using pipelines:
pipe.execute_command("GET",key, NEVER_DECODE=True)