kombu icon indicating copy to clipboard operation
kombu copied to clipboard

Celery's broadcast() always uses the default Kombu serializer

Open tliron opened this issue 12 years ago • 6 comments

The issue is specifically in Mailbox._publish in pidbox.py. There, a Producer instance is created and Producer.publish is called. However, the serializer is not set for this call, thus reverting to the default.

The workaround is to change the default serializer using kombu.serialization.registry._set_default_serializer.

It seems to me that the serializer should be taken from Channel.body_encoding. After all, wouldn't you want broadcast messages to use the same encoding as regular messages?

tliron avatar Oct 06 '13 05:10 tliron

Pidbox and celery events uses json to be compatible with other languages, and pickle is also a security concern. I don't see any reason to use a different serializer when it doesn't need to.

If using pidbox directly then I can see the use of specifying other serializers, but not sure I see it when using Celery?

ask avatar Oct 11 '13 15:10 ask

Well, OK. But why are broadcast messages using a different code-path from regular messages? Broadcast messages requires serializers registered in the Kombu registry. Regular messages use Codecs defined in the Transport class. I guess that's OK in implementation, but shouldn't they at least be configured together? If I use pickle for a certain transport (or a custom codec/serializer) then it actually has to be configured separately for both types.

And, actually the default JSON encoding is into JSON-encoded text. For a recent development (soon to be open sourced) we added a "raw" encoding that allows us to send the JSON data "as is" (we are sending it over REST, so there is no need to encode the JSON into a string). This required some fighting against Kombu's inherent string-centered encoding. Moreover, we found that we had to call set_default_serializer for broadcast messages to work with it.

I guess it doesn't make too much sense for a single Celery application to use multiple Kombu backends at the same time, each using different encodings. But, it seems broken that the current implementation won't allow for it: all broadcast messages, to all transports, would use the same default serializer.

tliron avatar Oct 11 '13 17:10 tliron

It uses the same serializer registry as everything else.

Maybe you're confused by the additional encoding used by the virtual transports?

AMQP messages consists of a message body and message properties (the properties includes the message headers). For virtual transports though the underlying storage does not support these concepts, so it must wrap the message so that it appears as a single payload:

# amqp library
channel.basic_publish(
    body, headers=headers,
    content_type='application/x-python-serialize',
    content_type='binary', **properties
)

# database library
redis.lpush(queue, base64encode(json.dumps({
    'body': body,
    'headers': headers,
    'content_type': 'application/x-python-serialize',
    'content_encoding': 'utf-8',
    'properties': properties,
})))

So this double-encoding is sadly necessary for the virtual transports. It always uses json for this, as I have not had any requests for this to be configurable. Similarly, as json does not support binary data it needs to base64 encode the message body (this is only needed for binary data, so it could be extended to avoid doing it for Unicode)

ask avatar Oct 12 '13 12:10 ask

I think you will understand this better when you see our REST transport, which uses raw encoding: it only converts to JSON at the very end, before sending on the wire.

What is the best way to contribute code to Kombu?

tliron avatar Oct 12 '13 12:10 tliron

You can submit a pull request here at Github!

ask avatar Oct 13 '13 12:10 ask

I think we can close this issue for now

auvipy avatar Sep 09 '21 09:09 auvipy