gradle-avro-plugin icon indicating copy to clipboard operation
gradle-avro-plugin copied to clipboard

Add task for registering avro schemas.

Open shanestrasser opened this issue 6 years ago • 14 comments

It would be nice if there was another task that could first check the schemas against a schema registry service for compatibility and if it passes, automatically register the avro schemas. We're adapting a build process where our pipeline first performs a compatibility check to ensure we don't push a build artifact that contains schemas that are incompatible. Additionally, we'd like to have the task register the schema if it passes.

I've put together a first attempt: https://gist.github.com/shanestrasser/e6d25cfcf0694f2ceb01fa9ccf32d1ea

which can then be used as:

registerSchemaSource {
    source = "src/main/avro/com/oracle/mercury/schema"
    checkForCompatibility = true
    registerSchema = true
    schemaRegistryUrl = "http://localhost:8081"
}

shanestrasser avatar Apr 30 '19 22:04 shanestrasser

Thank you for the feature request. This is the first time I've had schema registry integration requested.

Have you evaluated any stand-alone plugins in this area? For example, maybe one of these?

  • https://github.com/ImFlog/schema-registry-plugin
  • https://github.com/oasalonen/schema-registry-plugin

davidmc24 avatar May 01 '19 14:05 davidmc24

Yah, we looked at those and they're not as nice of plugins as yours. The two big things we didn't like about those plugins is that:

  1. Require the user to list out all of the schemas which they want registered. In our case, we have over 50 right now and will be adding more. We really don't want to have to worry about developers managing additional metadata for the schema
  2. Require the user to determine dependencies of the schema. This is the big one and why we prefer your plugin: it automatically builds up nested schema. We don't want developers having to manually list out all of the dependencies when they add a new avro schema.

shanestrasser avatar May 01 '19 15:05 shanestrasser

I can understand those reasons for preferring an approach based on my plugin. At the same time, I'm a hesitant to pull in dependencies on a third-party client library in order to support one of potentially many schema repositories, when it feels to me like schema publishing can/should be independent.

Perhaps if my plugin optionally wrote a file containing its discovered schema metadata (which schemas, dependencies), that would assist in having other plugins not need to re-discover that information? It could be exposed as a stand-alone task, or as part of the compilation process. Then, perhaps a publishing plugin could depend on the "generate metadata" task, read the file, and use it to publish the appropriate schema.

davidmc24 avatar May 01 '19 18:05 davidmc24

Understandable. Roughly, what are you thinking for the schema metadata discovery output to look like? Something like Schema A -> Schema B, Schema C (where A depends on B and C)? Although... thinking about this a little more, we don't even need this type of dependency metadata if the schemas are already fully defined. Maybe a better output of this new task would be a new set of schema files that are the fully nested definitions. These new schemas would also be the exact same schema definitions that gets set in the schema field of the java classes anyways. That way, a user doesn't have to worry about determining the dependency trees. Run the new script, get the expanded definitions, and then call register on those fully expanded schemas.

shanestrasser avatar May 02 '19 15:05 shanestrasser

That makes sense. Maybe a new GenerateSchemaSetsTask that takes as input arbitrary schema files that may have dependencies between them, and outputs a different set of schema files that have all of the necessary dependencies included. As you say, these schema sets are effectively an intermediary product of the process anyway; just not one that is currently persisted.

If you were presented with a directory of these schema set files, I would think it would be pretty easy to scan the directory and pass them to a repository for compatibility testing and/or registration, and this approach doesn't seem to require the definition of a metadata file format.

davidmc24 avatar May 02 '19 15:05 davidmc24

Thumbs up for the feature, we implemented own task for compatibility check. Having this functionality in plugin would be great.

eshepelyuk avatar May 02 '19 16:05 eshepelyuk

@eshepelyuk are you voting for the "register with schema server" feature, or the "generate schema sets" feature?

davidmc24 avatar May 02 '19 17:05 davidmc24

@davidmc24 mostly for "register with schema server" as it's closer to our scenario.

eshepelyuk avatar May 02 '19 17:05 eshepelyuk

@davidmc24 Yes, I think if had a set of those schema files, It would be easy to write (or use an existing) a plugin that scans and verifies/registers them.

shanestrasser avatar May 06 '19 18:05 shanestrasser

Any status update?

shanestrasser avatar Sep 03 '19 16:09 shanestrasser

@shanestrasser Thanks for reaching out. So... a bit of an update. I'm slowly making progress on an approach which will add a task that takes a directory of schema files, and "resolves" the dependencies such that what it writes to the output directory for the tasks is a new set of schema files where all the dependencies have been inlined. It does this in the same manner as the current plugin's compile task does, automatically determining the dependencies and ordering.

My plan is to get that working as a stand-alone task, and then adjust the other tasks in the plugin to be much simpler (no dependency handling logic at all; just single-purpose tasks that compile pre-resolved schemas/protocols).

As part of this, I'll also be extracting the operations that use the Avro Java libraries into stand-alone command-line tools that the tasks call. This will allow running the operations in a separate JVM from the Gradle plugin with a different classpath, thus allowing for support for arbitrary Avro versions (as opposed to a given version of the Gradle plugin currently being tied to a very small range of Avro versions).

As discussed earlier in this thread, once that resolution task is available, it should be possible to use its output as the input for compatibility verification and/or registration. I'm still thinking I don't want a dependency on io.confluent:kafka-schema-registry in the main plugin, but I might consider creating a related plugin to handle schema registry use cases. Still on the fence on that.

My work in progress is in the dep-resolver branch, but it isn't in a "works as a useful plugin" state yet.

davidmc24 avatar Sep 03 '19 17:09 davidmc24

Any updates on that resolver that writes all the schemas into 1 file that its referenced from I am looking for something like this!

ykcai avatar Jun 11 '20 03:06 ykcai

@ykcai Will answer your question on #115.

davidmc24 avatar Jun 11 '20 19:06 davidmc24

It seems like versioning/registering referenced/dependent schemas separately may be useful for independent evolution and that collapsing them all into a single nested schema / "subject" in Confluent terminology isn't always desirable. In other words, maybe we do need that metadata file originally discussed.

I'm also wondering about nested vs referenced schema interoperability with other languages (such as Python) - I'm not sure the Confluent Kafka Python API for example treats the schema "fingerprint" of a nested schema equivalently with the logically equivalent schema using references - it sounds like maybe the Java AVRO API does.

slominskir avatar May 10 '21 20:05 slominskir

I'm not going to be adding support for registering schemas to this plugin. Someone else is welcome to do so in a fork if desired.

davidmc24 avatar Sep 07 '22 03:09 davidmc24