beanstalkc
beanstalkc copied to clipboard
8-bit transparency under Python 3
Pull request #57 adds python 3 compatibility, but always tries to decode job bodies. Given that arbitrary byte strings may be put in the body, this is suboptimal.
For other protocol messages, only ascii content is accepted, so encoding/decoding automatically is fine.
I propose the following:
- Add an 'encoding' attribute to
Connection.__init__
, defaulting tosys.getdefaultencoding
For put:
- When you put a bytes objects it's put as-is
- Otherwise it's encoded with the encoding above
And for reseve/peek etc. (_read_body):
- When a body is read:
- If encoding is
None
, return a bytes object - Otherwise decode with the specified encoding and return a string
- If encoding is
If this design is acceptable, I'll rebase and amend the pull request to follow this design.
If encoding or decoding fails, should we raise Python's built-in exceptions or wrap them in a BeanstalkcException
? Similarly, when encoding is None
and a Unicode string is passed to put
what exception would we raise?
I think wrapping them with a BeanstalkcException
is a clean solution (similar to how socket problems are wrapped) but preserving exception information is only elegantly supported in Python 3.x:
raise SubclassBeanstalkcException from exc
@earl poke. Any comments?
I think it might be incorrect to support string encoding/decoding for beanstalkc. Beanstalkd is a lot like socket; it has no real knowledge (as far as I know) about the contents of a job; it just stashes bytes. There's no real way (or need) for beanstalk to know the encoding of a string job, or even that the job is a string.
What if beanstalkc only speaks bytes? The user would have to explicitly encode/decode on put/reserve, but I'm not sure that's a significant burden. We're implicitly relying on us-ascii/utf-8 being the same thing as byte strings most of the time in the Python 2 interface. Acknowledging that explicitly seems like the way to go. That sidesteps the 8-bit byte transparency issue entirely. Adding automatic encoding/decoding for strings is a convenience that can be added later.