gradle-avro-plugin
gradle-avro-plugin copied to clipboard
*.avsc referenced from another project
I have been digesting this to figure out if it is possible to reference .avsc files from another jar file
https://github.com/davidmc24/gradle-avro-plugin/issues/86
I create a 'common-model' jar file with the generated java from avro and the .avsc files in that jar file as well.
I was hoping I could add in the dependency into another jar that depends on that one, such as:
dependencies { implementation (group: 'com.foo', name: 'common-model', version: '1.0.0') { transitive = false } }
However, I cannot get this to work.
I am curious if this is a limitation of Avro compiler with .avsc files, an issue with my setup of the plugin, or something else.
Thanks
Short answer: it's complicated.
This plugin is built around the java library published by the Avro project... which doesn't always provide the capabilities one would want when writing a Gradle plugin.
In particular, the only interfaces it has for generating Java files from .avsc files are:
- void compileSchema(File[] srcFiles, File dest)
- void compileSchema(File src, File dest)
- void compileToDestination(File src, File dst)
In all cases, the source must be a file on the local filesystem. If you use the array overload to specify dependencies, they must be in the exact right order or it will fail to compile. Alternatively you can use the SpecificCompiler(Schema) constructor to prime it with dependency schema.
In theory, you can use whatever process you want to arrive at the dependency schema. The process currently used by the plugin is like this:
- Go to the source
FileTree
defined for the task - Filter it down to just
.avsc
files. - Run it through some complicated logic in
SchemaResolver
to determine whether that set of files is capable of being compiled, and automatically determining an appropriate order to do so in. - Using the order determined, for each source file, construct a
SpecificCompiler
with the necessary dependency schema and callcompileToDestination
.
My plan is to someday de-couple GenerateAvroJavaTask
from the dependency resolution process. If/when that happens, dependency resolution would happen in a separate ResolveAvroDependenciesTask
task which takes a set of .avsc files and writes them to fully-resolved .avsc files (which thus have inline declarations of all types and no external dependencies), such that you could then compile each files by passing in just that file as source, with no dependency schema. Said ResolveAvroDependenciesTask
already exists, but for now GenerateAvroJavaTask
still performs automatic dependency resolution.
For your situation, where you have a JAR that contains ".avsc" files you want to depend on, you'll need to declare it as a source for the GenerateAvroJavaTask
. For example, if I have breed.jar
that contains Breed.avsc
and src/main/avro/Cat.avsc
that depends on Breed.avsc
, I can configure it like this:
generateAvroJava {
source zipTree('breed.jar')
}
I'm sure there's some way to hook it up with Gradle dependency configurations, but I'm not sure exactly how.
Thanks for the advice to solve this as well as all of the valuable information.
I am not worried about hooking it up to the dependency configuration, this solution should do the trick. Thanks again!
Hi @nbuesing, I wonder if you were able to hook it up to the dependency configuration. If so, do you mind sharing some more code snippets? Thanks!
actually, I got farther but issues did remain and due to a deadline I combined my two avro projects into 1.
I'll take a look at this more tonight and add an example project to the repo.
@nbuesing @philipp94831
I've added a couple additional examples here that might be useful.
- avsc-from-external-jar: generating Java objects from a schema file that depends on schema files in an arbitrary JAR file (not produced by Gradle)
- avsc-from-subproject: generating Java objects from a schema file that depends on schema files in a JAR produced by a different subproject within a multi-project Gradle build
Please let me know if this works for either of you, and if there are other variants that would be useful to have as examples.
Hi @davidmc24, that works like a charm! Thank you very much. I also added
sourceSets {
main {
resources {
srcDirs "src/main/avro"
}
}
}
so that I don't have to rename my avro directory to resources in the subproject example. I furthermore have one question:
If I package the jar in the cat project, it also contains the compiled classes of the schema project. I wonder if there is a way to define an exclusion on the cate project. I tried
sourceSets {
main {
java {
exclude 'example/Breed**'
}
}
}
but the java source set does not contain the classes. Do you know which part of the source set I have to reference? The classes are located in build/generated-main-avro-java
but I don't know how to reference it.
Also, do you think it is an option to add an additionalSchema and testAdditionalSchema configuration to the plugin by default so we don't have to setup this dependency stuff? I think that would be a great addition
So I think I found a proper solution which let's you compile Avro schemas with dependencies but also include the artifacts in other projects. The only thing that does not work properly is caching of the generateAvroJava task. I guess this is related to the zipTree and the /tmp/expandedArchives/
folder the files are expanded to. The compileJava task on the other hand is properly cached.
import java.util.zip.ZipEntry
import java.util.zip.ZipFile
sourceSets {
main {
resources {
srcDirs "src/main/avro"
}
}
}
configurations {
additionalSchema
api.extendsFrom additionalSchema
}
generateAvroJava {
dependsOn configurations.additionalSchema
source {
configurations.additionalSchema.collect {
zipTree(it)
}
}
}
def configureJar = tasks.register("configureJar") {
it.doLast {
List<String> exclusions = configurations.additionalSchema
.findAll {
it.name.endsWith("jar")
}
.collect { File file ->
new ZipFile(file).entries()
.findAll {
it.name.endsWith(".class")
}
.collect { ZipEntry entry ->
return entry.name
}
}
.flatten()
tasks.jar.exclude(exclusions)
}
// otherwise the jars of dependent projects might not have been built
// TODO is there a way to copy the dependencies of the jar task? classes is not part of tasks.jar.dependsOn
it.dependsOn(tasks.classes)
}
tasks.named("jar") {
it.dependsOn(configureJar)
}
I don't know if any of this is worth integrating into the plugin itself. Thanks for your help!
Nice example. I've integrated it into the avsc-from-subproject example in the repository. Using doFirst
on the jar task I was able to eliminate it.dependsOn(task.classes)
.
I can confirm that using zipTree appears to disable build caching, as it uses a different temp path every time. That seems like something that should just work. I'll submit a ticket to the gradle build tool project.
I also used doFirst on the jar task at first but that also broke caching. I then found this in the Gradle docs https://docs.gradle.org/current/userguide/common_caching_problems.html#custom_actions
As for whether it makes sense to include some of this by default in the Avro conventions plugin... maybe. I've gotten enough requests for ways to depend on external schema that I think it would get used. I think I need to think on it some more to figure out what variants there are, and how many (if any) new configuration options would be needed. I've added a design doc as a placeholder, as it's on the short list of new feature ideas to consider working on. But time to work on this project is scarce, so it might be a while.
I also used doFirst on the jar task at first but that also broke caching.
Good point. Fixed that based on your pattern.
Submitted https://github.com/gradle/gradle/issues/18382
As for whether it makes sense to include some of this by default in the Avro conventions plugin... maybe. I've gotten enough requests for ways to depend on external schema that I think it would get used. I think I need to think on it some more to figure out what variants there are, and how many (if any) new configuration options would be needed. I've added a design doc as a placeholder, as it's on the short list of new feature ideas to consider working on. But time to work on this project is scarce, so it might be a while.
Sure, no worries. For now, we finally have a good way to modularize our dependencies. Everything added to the plugin would simply make it easier for other users but we are fine for now :-)
Hi, just as an updated: I now wired the exclusion to the compileJava step
def configureCompileJava = tasks.register("configureCompileJava") {
it.doLast {
List<String> exclusions = configurations.additionalSchema
.findAll {
it.name.endsWith("jar")
}
.collect { File file ->
new ZipFile(file).entries()
.findAll {
it.name.endsWith(".class")
}
.collect { ZipEntry entry ->
return entry.name.replaceAll(".class\$", ".java")
}
}
.flatten()
tasks.compileJava.exclude(exclusions)
}
// TODO is there a way to copy the dependencies of the compileJava task? generateAvroJava is not part of tasks.compileJava.dependsOn
// The jars of dependent projects might not have been built
it.dependsOn(tasks.generateAvroJava)
}
tasks.named("compileJava") {
it.dependsOn(configureCompileJava)
}
When wiring it to the jar step, the plugin generates java classes of the dependent .avsc files and uses them for compilation. However, depending on your plugin config, they might look different than the class files provided by the artifact. This way, the .avsc files are only used for avroJavaGeneration and not used for compilation.
Hi @davidmc24, we created our own plugin that wraps yours: https://github.com/bakdata/gradle-avro-dependency-plugin Our plugin makes it possible to use dependencies when compiling avro schemas. However, it would be nice to have it integrated in your plugin. So feel free to check it out and then maybe we can work out a contribution here.
Cool. I'll take a look when I have a chance.
@philipp94831 I had a chance to take a look. Looks like progress in a good direction. Thanks for bringing it to my attention.
My notes on how it looks like the plugin works:
- configures new configurations avroImplementation/avroApi per sourceset
- uses a per-sourceset temporary directory
external-SOURCESET-avro
- registers a number of tasks per sourceset: configureDeleteExternalJava, deleteExternalJava, configureCopyExternalAvroResources, copyExternalAvroResources
- deleteExternalJava is used to delete
.java
files that are associated with.class
files that are in external jars. The assumption appears to be that such classes have already been generated elsewhere and we don't want to include them in the current component's JARs. - configureDeleteExternalJava is a separate task because deleteExternalJava is using a general-purpose Delete task (and thus needs to be told what to delete), the logic to determine what to delete requires resolution of dependencies, and we don't want to perform resolution of dependencies at configuration-time to avoid slowing down the build when the task isn't called.
I looked at https://github.com/gradle/gradle/issues/18382 again, and think it's possible we could avoid having a temporary directory. Would you mind taking a look to see if, in your plugin, loading from a zipTree works with build caching if the source isn't the zipTree directly, but instead zipTree.files
? If so, maybe we could avoid the need for the configureCopyExternalAvroResources
and copyExternalAvroResources
tasks?
I'm concerned that deleteExternalJava
is potentially invalidating the cache-ability of the generateAvroJava
task, by modifying its output directory. I think a better way to handle this would be to add a custom action to the generateAvroJava task that performs the operations currently performed by both configureDeleteExternalJava
and deleteExternalJava
. With that approach, the cache key for the generateAvroJava
output won't be calculated until after the action completes. If we end up merging your plugin into mine, it could potentially just be an action within the task itself.
maybe we can work out a contribution here
I'm open to including this capability in the main plugin, based on the approach you've demonstrated. The main constraint is that I don't have a lot of time to spend on this plugin these days. If you're interested in porting your code to Java and submitting a pull request, I'll make sure to make time to review and take it the rest of the way. Otherwise, it may wait a while until I have enough time to do so myself.
Hi @davidmc24, your suggestion wrt ZipTree seems to be working. Also the delete task does not seem to break caching. Here is a PR I am currently working on https://github.com/bakdata/gradle-avro-dependency-plugin/pull/2. Maybe you can have a look if it resolves your concerns.
The purpose of deleteExternalJava is exactly what you described