cbor2
cbor2 copied to clipboard
C-extension does allow customizing serialization
Things to check first
-
[X] I have searched the existing issues and didn't find my bug already reported there
-
[X] I have checked that my bug is still present in the latest release
cbor2 version
5.5.1
Python version
3.11
What happened?
The behaviour of the serializer differs between the C extension and the Python implementation of this library.
The Python implementation allows customizing the by updating the value of the _encoders
map for known types. The C-implementation as an optimization does not use this map and defines it's own implementations of encoders for the known types. This comment on the original issue adding the C implementation shows that it was intended that the C implementation allow for the canonical
parameter to be set to a value of 2 to opt into the slower codepath where the Python defined encoders were used.
The two changes below appear to be the cause of this:
- https://github.com/agronholm/cbor2/pull/77 enforced that
enc_style
is only ever 0 or 1 (0 when initialized, 1 if the input is exactly 1). - This commit changed the parsing of the field to parse it as a predicate and not a byte.
A few questions here:
- I recognize that modifying
_encoders
is not necessarily a supported interface for the library given how the variable is named? If that's not the case, is there no supported way to perform this substitution? - If allowing the C implementation to use the Python defined encoders, how should this be exposed? Go back to overloading the
canonical
parameter? Introduce an Enum as a value for that field?
How can we reproduce the bug?
The following test case should show the difference. The python implementation will pass, the C implementation will fail.
import calendar
def test_encoder_replacement(impl):
def to_epoch(dt: datetime) -> float:
millis = int(dt.microsecond / 1000.0) / 1000.0
return calendar.timegm(dt.astimezone(timezone.utc).timetuple()) + millis
def encode_datetime(self, obj: datetime):
self.encode_float(to_epoch(obj))
dt = datetime(2013, 3, 21, 20, 4, 0, tzinfo=timezone.utc)
expected = unhexlify("fb41d452d9ec000000")
with BytesIO() as fp:
encoder = impl.CBOREncoder(fp, canonical=2)
encoder._encoders[datetime] = encode_datetime
encoder.encode(dt)
serialized = fp.getvalue()
Curiously enough, this could have been missed in version 5.4.3 if people were accessing the encoder via cbor2.encoder
instead of the top level module. The C extension didn't shadow namespaces other than the top level one.
None of the serialization libraries I know of allow overriding the handling of built-in types. This was never supported, as you correctly deduced from the naming of the _encoders
private variable. What's your use case?