avrotize icon indicating copy to clipboard operation
avrotize copied to clipboard

Unable to generate json schema fom avro that references a fixed type by name

Open da77a opened this issue 11 months ago • 0 comments

Current Behaviour

I have the following avro schema:

[

{
    "name": "UID",
    "type": "fixed",
    "size": 16
}
,
{
    "name": "Content",
    "type": "record",
    "fields":
    [
        {"name": "name", "type": "string"},
        {"name": "hash", "type": "UID", "description": "unique content id" }
    ]
}

]

The above schema is valid according to Avro spec https://github.com/clemensv/avrotize/blob/f9416d3d59f95248a66fe446f36a7f0773606286/specs/avro-schema.md?plain=1#L317-L329 and is accepted by various other tools including other tools in avrotize. Attempting to generate a JSON Schema from it using a2j fails with "Unknown type reference"

Unsurprisingly directly defining the Content "hash" field as fixed and size 16, rather than using a named type, does not produce an error, but nor does it constrain the JSON consistently with the Avro constraints for a fixed type.

Expectation

The most important expectation is that the json schema include a named type - whatever it might be, and reference it where used.

The rest is harder to be clear about given the divergence between avrotize usage of avro schema definitions as a common language but not to define a serialization.

My expectation, given that is that the definition of a fixed type in avrotize schema would be represented in JSON Schema as a string with BASE16 contentEncoding and with maxLength and minLength both set to twice the size value, and that it would be able to be referenced by name like other named types (i.e. created as a definition and referenced where used).

My expectation could be wrong - https://github.com/clemensv/avrotize/blob/master/specs/avrotize-schema.md#37-fixed-type doesn't say anything about how the fixed type should be translated to JSON (good) but the problematic Avro specification usage of strings of \u0000 .. \u00FF for default values is retained. If that definition is to be used in this conversion then the "right" answer would be to use a pattern constraint to impose this encoding on fixed and byte string representations.

da77a avatar Mar 10 '25 08:03 da77a