redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

Differences in parsing with decode_responses=True, with and without hiredis

Open dpopowich opened this issue 7 years ago • 18 comments

I ran into a similar problem as #809: I have a redis connection using decode_repsonses=True, but I am also using a 3rd-party package (RQ) that pickles some data and deep in that library I get UnicodeDecodeErrors. I'm kind of stuck because of that library's use of pickled data.

However, I thought to install hiredis-py to see if it handled parsing differently and it does.

So, with this:

import redis
import pickle
rdb = redis.StrictRedis(decode_responses=True)
data = pickle.dumps(dict(hello='world'))
rdb.set('foo', data)
rdb.get('foo')

With hiredis installed I see:

b'\x80\x03}q\x00X\x05\x00\x00\x00helloq\x01X\x05\x00\x00\x00worldq\x02s.'

Without hiredis installed, I get:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Looking at hiredis-py code, I see the equivalent of this:

if encoding:
   try:
      data = data.decode(encoding)
   except ValueError:
      pass

I'm wishing redis-py did the same. I'd love to see the following change to connection.py:

def decode(self, value, force=False):
   "Return a unicode string from the byte representation"
   if (self.decode_responses or force) and isinstance(value, bytes):
      try:
         value = value.decode(self.encoding, self.encoding_errors)
      except ValueError:
         # Ignore encoding and return bytes
         pass
   return value

Considering using hiredis comes highly recommended, I'd love to see this base difference in parsing resolved.

Thanks!

dpopowich avatar Jul 10 '18 14:07 dpopowich

same problem here

zanpen2000 avatar Jul 20 '18 11:07 zanpen2000

Related: #772

jeffwidman avatar Oct 12 '18 08:10 jeffwidman

This could be a good PR will look at it

theodesp avatar Dec 03 '18 23:12 theodesp

Ya, this is definitely a problem. In my opinion, it's more "correct" to raise an error in this case. We're asking for a decoded response in a specific encoding. Getting back something different is unexpected and should never happen.

I believe hiredis should be changed so that a decoding error raises a ValueError instead of returning the raw bytes. I've added an issue to hiredis-py here: redis/hiredis-py#81.

@dpopowich In your specified case, I believe you should have a separate Redis connection that sets decode_responses=False and use that connection when interacting with pickled values (which are bytes, not strings).

Additionally, I'm currently working on a new feature that will allow a single connection to return decoded strings or bytes based on using a context manager. It should be included in the next version which I hope to release in early January.

andymccurdy avatar Dec 26 '18 21:12 andymccurdy

@andymccurdy , I have the same problem here, in my opinion, using decode_responses=False in one connection may be too coarse-grained, why can't i using decode_responses=False in one method just like redis.get(key, decode_responses=False), as for a specific key, i want the value is bytes, other key are strs, but i only want to have a single redis client instance globally.

milabobo avatar Dec 28 '18 07:12 milabobo

That's pretty much what you'll be able to do with the context manager I'm working on. It's fairly complicated because of how pipelines work and the hiredis-py integration. Something like:

r = redis.Redis(..., decode_responses=True)
with r.responses_as_bytes:
    r.get('my-key')  # will return the value as bytes

I haven't settled on the context manager name yet.

andymccurdy avatar Dec 28 '18 07:12 andymccurdy

That would be cool!

milabobo avatar Dec 28 '18 12:12 milabobo

That's pretty much what you'll be able to do with the context manager I'm working on. It's fairly complicated because of how pipelines work and the hiredis-py integration. Something like:

r = redis.Redis(..., decode_responses=True)
with r.responses_as_bytes:
    r.get('my-key')  # will return the value as bytes

Yes, this is cool, but I don't know how it's going to solve my problem where I'm using a 3rd party tool that stores both my data (str) and it's own meta data (pickled data, bytes). I can't have two connections for this, nor can I use this context manager without forking my own copy of the 3rd party project.

Will I be able to work around it? Yes, at some cost, but meanwhile, I still think "please decode what you can, but don't blow up my whole application with an exception if you can't" is a legitimate, robust solution:

      try:
         value = value.decode(self.encoding, self.encoding_errors)
      except ValueError:
         # Ignore encoding and return bytes
         pass

dpopowich avatar Dec 28 '18 14:12 dpopowich

@dpopowich Your suggestion might work out some of the time on Python 2 but it will be break many things on Python 3. Many stdlib functions specifically require bytes arguments. Others specifically require string objects. They aren't interchangeable. Not being able to reason about whether a return type is bytes or a string is a big problem.

If you can only have one connection, consider turning off decode_response. You can manually .decode() responses you need as strings within your part of the code base.

Once the context manager is complete, libs that use redis-py such as RQ can add a one-line context manager in places where they require binary data.

andymccurdy avatar Dec 28 '18 17:12 andymccurdy

The proposed change (redis/hiredis-py#82) to hiredis-py has been accepted and merged. Future versions of hiredis-py will behave sanely and no longer silently discard decoding errors by default.

andymccurdy avatar Dec 30 '18 10:12 andymccurdy

@andymccurdy Stumbled upon this thread when investigating my own bug with decoding responses with binary data. Is the addition of a r.responses_as_bytes context manager still on the roadmap or have you switched approaches?

piercefreeman avatar Feb 26 '20 16:02 piercefreeman

I tried to encode and save a string with Spanish letters, Redis would save it and I am able to view it in redis-cli but when using the same logic in my python program where I do a 'hscan', this returns null. Is this issue related to this or am I in the wrong thread?

harishkumarr95 avatar Nov 23 '20 11:11 harishkumarr95

@piercefreeman That's still what I would like to do. I just haven't had time to implement it.

andymccurdy avatar Nov 23 '20 17:11 andymccurdy

@harishkumarr95 I think you're in the wrong thread. Can you open up a new issue and provide a code sample that fails?

andymccurdy avatar Nov 24 '20 21:11 andymccurdy

Related to https://github.com/Grokzen/redis-py-cluster/discussions/447 @andymccurdy comment as you see fit.

If future versions of redis-py will not support decode_responses=False please advise, thanks.

charlez avatar Apr 15 '21 18:04 charlez

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Apr 16 '22 00:04 github-actions[bot]

bad bot

rolette avatar Apr 16 '22 01:04 rolette

@andymccurdy Any update on this? In the meantime I figured out a workaround:

import redis
r = redis.Redis(decode_responses=True)
try:
    data = r.get("some_binary_data")
except UnicodeDecodeError as e:
    data = e.object

Python documentation: https://docs.python.org/3/library/exceptions.html#UnicodeError.object

mikelei8291 avatar Apr 16 '22 02:04 mikelei8291

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Apr 17 '23 00:04 github-actions[bot]

In case you want to get more than one key at a time, using pipelines:

pipe.execute_command("GET",key, NEVER_DECODE=True)

johan-seesaw avatar Oct 27 '23 04:10 johan-seesaw