avro_ex icon indicating copy to clipboard operation
avro_ex copied to clipboard

[Proposal]: Add UUID conversion to and from 16 byte fixed sequences

Open urmastalimaa opened this issue 11 months ago • 0 comments

UUIDs are often passed around in application code in their canonical, hex as string representation e.g. "550e8400-e29b-41d4-a716-446655440000". Encoding UUIDs as Avro "string"s takes 37 bytes, while encoding UUIDs in their binary form fits into a 16 byte sized "fixed", saving 21 bytes per encoding.

This change allows application code to keep passing around canonical hex UUIDs while converting to the compact encoding, requiring only uuid_format: :canonical_string to be given in decode options.

The Java reference implementation also supports encoding UUIDs as both strings and 16 byte fixed sequences.

  • Encoding is augmented such that a 16 byte fixed schema with %{"logicalType" => "uuid"}, converts a hex-string UUID to the 16 byte binary representation.

  • Decoding is augmented such that given uuid_format: :canonical_string in decode options, the binary representation is converted to the canonical hex-string representation.

The encoding change is nearly backwards-compatible, previously when given an incorrectly size "fixed" with {"logicalType": "uuid"}, an error was raised, while now conversion is attempted.

The decoding change is fully backwards-compatible, as uuid_format defaults to :binary.

For UUID codec, the uniq library was added (no transitive dependencies).

urmastalimaa avatar Feb 12 '25 17:02 urmastalimaa