kombu icon indicating copy to clipboard operation
kombu copied to clipboard

Serialization/Deserialization UUID issue

Open nsquaretologn opened this issue 3 years ago • 5 comments

Description: I am using django backend and celery. Redis is the queuing mechanism which we are using to connect both. When Django backend is of version redis-4.3.4 (Kombu stable version) and celery worker code is of version redis-4.4.0rc1 (Kombu alpha version) or vice versa, there is a problem in UUID serialization. When uuid is sent from django backend to redis, it is received as dictionary due to this change. Existing production code may break due to this change if they have redis=* in pip file or when there is a version mismatch between django and celery resulting in

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/models/fields/__init__.py", line 2336, in to_python
    return uuid.UUID(**{input_form: value})
  File "/usr/local/lib/python3.8/uuid.py", line 166, in __init__
    hex = hex.replace('urn:', '').replace('uuid:', '')
AttributeError: 'dict' object has no attribute 'replace'

This change versions kombu is not backward compatible. We need to de-serialize a UUID as UUID and not a dictionary

nsquaretologn avatar Aug 24 '22 14:08 nsquaretologn

Hey @nsquaretologn :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.

@el-chogo can you chime in please?

auvipy avatar Aug 25 '22 05:08 auvipy

The problem with reverting this is that we don't have a way to identify the object's type when using the json serializer and therefore when we encounter a UUID object, we get an error since the json module doesn't know how to serialize it. When deserializing, we need to know to which object we need to convert the value.

I'd like a test case that reproduces this error. I'm also open to suggestions on how to fix this problem.

thedrow avatar Aug 25 '22 10:08 thedrow

Testcase: Backward compatibility When a Django server using kombu stable version (5.2.4) with redis (4.3.2) sends a message to celery using kombu beta version(b1) and redis pre release version (rc1), message sent as UUID is received as a dictionary.

When systems sending message has stable version and receiving has pre-release version, this change is not supported.

nsquaretologn avatar Aug 25 '22 14:08 nsquaretologn

Oh, but that's because of a version mismatch (may I add, with a pre-release version) isn't it? Similar changes have occurred in 55452a15469cff38ad9a49e2730fe7250ec56ccd and 894ddfc8b60ee615966da9fc9f97bd618e449415. My best guess at the moment is that this could be documented as a breaking change (at least for some workflows). I'm not so sure whether Celery/Kombu can be held responsible for maintaining compatibility between stable and pre-release versions. It's a fact that software changes and at some point things won't be compatible anymore.

All in all, yes, what you report is quite expected. But I think the unsupported workflow would be sending a message from a version that includes this change and receiving it in a previous version. If a message is sent from a previous version to the new one it should be serialized as a str and thus it'd fall under the textual if.

A note could also be added so that people attempting to upgrade do so in downtime. That way they can ensure that pending tasks are processed and start anew with the new serializer/deserializer. This wouldn't be rare as some past changes have required the exact same thing.

I submitted the patch in hopes it'd be useful for someone and I still think it makes sense because there are some differences between UUID versions and a type of their own in Python. If we revert back to the old behaviour we're losing version information, some properties and of course, typing.

Of course, I might be wrong.

EDIT: sorry for being late to the discussion, last days have been a bit difficult ^^.

el-chogo avatar Aug 25 '22 18:08 el-chogo

I think having a breaking issue is accepted in 5.3 upgrading from 5.2 considering the fact that this is going to fix a technical limitation of current/previous implementation. In either case we should document it and if possible provide some workaround to avoid this types of errors. I am looking forward to have some failing tests as well.

auvipy avatar Nov 27 '22 14:11 auvipy

@nusnus whenever you have some time

auvipy avatar Nov 27 '22 15:11 auvipy

Before final 5.3 is released, i would like to propose a refactoring for json serializer. Currently it looks like many ifs, i would like to switch to something more beatiful.

In my project i use following code:

"""
JSON Decoder and Encoder with support for additional types.
"""
import json
from datetime import date, datetime
from decimal import Decimal
from typing import Any, Callable, TypeVar

from django.utils.functional import Promise

from bson.objectid import ObjectId


class JSONEncoder(json.JSONEncoder):
    def default(self, o):
        for t, (marker, encoder) in _encoders.items():
            if isinstance(o, t):
                return {
                    "__type__": marker,
                    "__value__": encoder(o),
                }
        if isinstance(o, (Promise, ObjectId)):
            return str(o)
        return super().default(o)


def object_hook(o: dict):
    if o.keys() == {"__type__", "__value__"}:
        if decoder := _decoders.get(o["__type__"]):
            return decoder(o["__value__"])
        else:
            raise ValueError("Unsupported type", type, o)
    else:
        return o


DecoderT = EncoderT = Callable[[Any], Any]
T = TypeVar("T")
EncodedT = TypeVar("EncodedT")


def register_type(
    t: type[T],
    marker: str,
    encoder: Callable[[T], EncodedT],
    decoder: Callable[[EncodedT], T],
):
    _encoders[t] = (marker, encoder)
    _decoders[marker] = decoder


_encoders: dict[type, tuple[str, EncoderT]] = {}
_decoders: dict[str, DecoderT] = {}

register_type(datetime, "datetime.iso", datetime.isoformat, datetime.fromisoformat)
register_type(
    date,
    "date",
    lambda o: o.isoformat(),
    lambda o: datetime.fromisoformat(o).date(),
)
register_type(Decimal, "decimal", str, Decimal)

It could be adapted to store versioned UUIDs, and whatever else we may need. But we need to change format from {"__datetime__" true, "datetime": ...} to {"__type__": "datetime", "__value__": ...}

If it sounds like a good idea, please let me know, i will prepare PR.

last-partizan avatar Mar 01 '23 11:03 last-partizan

please send a PR with tests and ping me for review

auvipy avatar Mar 01 '23 12:03 auvipy

Amazing! Please add me for review as well!

Thanks

Nusnus avatar Mar 01 '23 12:03 Nusnus

I guess we can close this now

auvipy avatar Mar 02 '23 10:03 auvipy

Now that we merget this refactoring, maybe we should update docs on serialization, and make register_type truly public?

To allow users to add their custom data types if they need it.

last-partizan avatar Mar 02 '23 10:03 last-partizan