v22.3.1 Schemaregistry breaks avro metadata support working in v22.2.7.
Version & Environment
Redpanda version: v22.3.1
What went wrong?
Unable to register schema which contain custom attributes as metadata which are not strings, this used to work in v22.2.7.
The official avro spec allows custom attributes as metadata as long as they do not affect the serialised data but does not say what type they should be.
https://avro.apache.org/docs/1.11.1/specification/#schema-declaration
This works ok on the Confluence schema registry and Karapace registry.
What should have happened instead?
schema should of registered
How to reproduce the issue?
example schema
{
"type": "record",
"name": "foo",
"pii": true,
"fields": [
{
"name": "bar",
"type": "float",
"pii": true,
}
]
}
Thanks @owenhaynes. The way the title is worded it sounds like it may have been working for your prior to v22.3.1?
/cc @BenPope @NyaliaLui
@dotnwat yeah was working before v22.3.1 i have updated the title.
It appears this was known by the authors of the avro library. https://github.com/apache/avro/pull/1826#discussion_r944507324
I can reproduce the problem on dev.
If you attempt to register the schema with the "pii" field
sensor_schema = {
"type": "record",
"name": "foo",
"pii": True,
"fields": [
{
"name": "bar",
"type": "float",
"pii": True,
}
]
}
res = requests.post(
url=f'{base_uri}/subjects/sensor-value/versions',
data=json.dumps({
'schema': json.dumps(sensor_schema)
}),
headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json()
then we get the error response:
{
"error_code": 422,
"message": "Invalid schema Invalid type. Expected \"string\" actual bool"
}
then we get the error response:
{ "error_code": 422, "message": "Invalid schema Invalid type. Expected \"string\" actual bool" }
This error comes from schema_registry/avro.cc::make_avro_schema_definition via sharded_store::project_ids
make_avro_schema_definition(sharded_store& store, canonical_schema schema) {
std::optional<avro::Exception> ex;
try {
auto name = schema.sub()();
auto refs = co_await collect_schema(store, {}, name, std::move(schema));
auto def = refs.flatten();
co_return avro_schema_definition{avro::compileJsonSchemaFromMemory(
reinterpret_cast<const uint8_t*>(def.data()), def.length())};
} catch (const avro::Exception& e) {
ex = e;
}
co_return ss::coroutine::exception(
std::make_exception_ptr(as_exception(error_info{
error_code::schema_invalid,
fmt::format("Invalid schema nyalia bro {}", ex->what())})));
}
The call stack looks something like this
handlers::post_subject_versions -> seq_writer::write_subject_version -> seq_writer::do_write_subject_version -> sharded_store::project_ids -> sharded_store::validate_schema -> avro.cc::make_avro_schema_definition -> error
As per: https://github.com/apache/avro/pull/1826#discussion_r945465497 - https://github.com/redpanda-data/avro/pull/99
This should be fixed by https://github.com/redpanda-data/redpanda/pull/7352 in v22.3.4