redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

v22.3.1 Schemaregistry breaks avro metadata support working in v22.2.7.

Open owenhaynes opened this issue 3 years ago • 6 comments

Version & Environment

Redpanda version: v22.3.1

What went wrong?

Unable to register schema which contain custom attributes as metadata which are not strings, this used to work in v22.2.7.

The official avro spec allows custom attributes as metadata as long as they do not affect the serialised data but does not say what type they should be.

https://avro.apache.org/docs/1.11.1/specification/#schema-declaration

This works ok on the Confluence schema registry and Karapace registry.

What should have happened instead?

schema should of registered

How to reproduce the issue?

example schema

{
    "type": "record",
    "name": "foo",
    "pii": true,
    "fields": [
               {
            "name": "bar",
            "type": "float",
            "pii": true,
        }
    ]
}

owenhaynes avatar Nov 14 '22 15:11 owenhaynes

Thanks @owenhaynes. The way the title is worded it sounds like it may have been working for your prior to v22.3.1?

/cc @BenPope @NyaliaLui

dotnwat avatar Nov 16 '22 00:11 dotnwat

@dotnwat yeah was working before v22.3.1 i have updated the title.

owenhaynes avatar Nov 16 '22 08:11 owenhaynes

It appears this was known by the authors of the avro library. https://github.com/apache/avro/pull/1826#discussion_r944507324

BenPope avatar Nov 16 '22 13:11 BenPope

I can reproduce the problem on dev.

If you attempt to register the schema with the "pii" field

sensor_schema = {
    "type": "record",
    "name": "foo",
    "pii": True,
    "fields": [
               {
            "name": "bar",
            "type": "float",
            "pii": True,
        }
    ]
}

res = requests.post(
    url=f'{base_uri}/subjects/sensor-value/versions',
    data=json.dumps({
      'schema': json.dumps(sensor_schema)
    }),
    headers={'Content-Type': 'application/vnd.schemaregistry.v1+json'}).json()

then we get the error response:

{
  "error_code": 422,
  "message": "Invalid schema Invalid type. Expected \"string\" actual bool"
}

NyaliaLui avatar Nov 16 '22 13:11 NyaliaLui

then we get the error response:

{
  "error_code": 422,
  "message": "Invalid schema Invalid type. Expected \"string\" actual bool"
}

This error comes from schema_registry/avro.cc::make_avro_schema_definition via sharded_store::project_ids

make_avro_schema_definition(sharded_store& store, canonical_schema schema) {
    std::optional<avro::Exception> ex;
    try {
        auto name = schema.sub()();
        auto refs = co_await collect_schema(store, {}, name, std::move(schema));
        auto def = refs.flatten();
        co_return avro_schema_definition{avro::compileJsonSchemaFromMemory(
          reinterpret_cast<const uint8_t*>(def.data()), def.length())};
    } catch (const avro::Exception& e) {
        ex = e;
    }
    co_return ss::coroutine::exception(
      std::make_exception_ptr(as_exception(error_info{
        error_code::schema_invalid,
        fmt::format("Invalid schema nyalia bro {}", ex->what())})));
}

The call stack looks something like this handlers::post_subject_versions -> seq_writer::write_subject_version -> seq_writer::do_write_subject_version -> sharded_store::project_ids -> sharded_store::validate_schema -> avro.cc::make_avro_schema_definition -> error

NyaliaLui avatar Nov 16 '22 13:11 NyaliaLui

As per: https://github.com/apache/avro/pull/1826#discussion_r945465497 - https://github.com/redpanda-data/avro/pull/99

BenPope avatar Nov 16 '22 13:11 BenPope

This should be fixed by https://github.com/redpanda-data/redpanda/pull/7352 in v22.3.4

BenPope avatar Nov 23 '22 14:11 BenPope