gradle-avro-plugin icon indicating copy to clipboard operation
gradle-avro-plugin copied to clipboard

*.avsc referenced from another project

Open nbuesing opened this issue 4 years ago • 19 comments

I have been digesting this to figure out if it is possible to reference .avsc files from another jar file

https://github.com/davidmc24/gradle-avro-plugin/issues/86

I create a 'common-model' jar file with the generated java from avro and the .avsc files in that jar file as well.

I was hoping I could add in the dependency into another jar that depends on that one, such as:

dependencies { implementation (group: 'com.foo', name: 'common-model', version: '1.0.0') { transitive = false } }

However, I cannot get this to work.

I am curious if this is a limitation of Avro compiler with .avsc files, an issue with my setup of the plugin, or something else.

Thanks

nbuesing avatar Oct 21 '20 14:10 nbuesing

Short answer: it's complicated.

This plugin is built around the java library published by the Avro project... which doesn't always provide the capabilities one would want when writing a Gradle plugin.

In particular, the only interfaces it has for generating Java files from .avsc files are:

In all cases, the source must be a file on the local filesystem. If you use the array overload to specify dependencies, they must be in the exact right order or it will fail to compile. Alternatively you can use the SpecificCompiler(Schema) constructor to prime it with dependency schema.

In theory, you can use whatever process you want to arrive at the dependency schema. The process currently used by the plugin is like this:

  1. Go to the source FileTree defined for the task
  2. Filter it down to just .avsc files.
  3. Run it through some complicated logic in SchemaResolver to determine whether that set of files is capable of being compiled, and automatically determining an appropriate order to do so in.
  4. Using the order determined, for each source file, construct a SpecificCompiler with the necessary dependency schema and call compileToDestination.

My plan is to someday de-couple GenerateAvroJavaTask from the dependency resolution process. If/when that happens, dependency resolution would happen in a separate ResolveAvroDependenciesTask task which takes a set of .avsc files and writes them to fully-resolved .avsc files (which thus have inline declarations of all types and no external dependencies), such that you could then compile each files by passing in just that file as source, with no dependency schema. Said ResolveAvroDependenciesTask already exists, but for now GenerateAvroJavaTask still performs automatic dependency resolution.

For your situation, where you have a JAR that contains ".avsc" files you want to depend on, you'll need to declare it as a source for the GenerateAvroJavaTask. For example, if I have breed.jar that contains Breed.avsc and src/main/avro/Cat.avsc that depends on Breed.avsc, I can configure it like this:

generateAvroJava {
    source zipTree('breed.jar')
}

I'm sure there's some way to hook it up with Gradle dependency configurations, but I'm not sure exactly how.

davidmc24 avatar Oct 21 '20 20:10 davidmc24

Thanks for the advice to solve this as well as all of the valuable information.

I am not worried about hooking it up to the dependency configuration, this solution should do the trick. Thanks again!

nbuesing avatar Oct 21 '20 20:10 nbuesing

Hi @nbuesing, I wonder if you were able to hook it up to the dependency configuration. If so, do you mind sharing some more code snippets? Thanks!

philipp94831 avatar Sep 22 '21 05:09 philipp94831

actually, I got farther but issues did remain and due to a deadline I combined my two avro projects into 1.

nbuesing avatar Sep 22 '21 11:09 nbuesing

I'll take a look at this more tonight and add an example project to the repo.

davidmc24 avatar Sep 22 '21 12:09 davidmc24

@nbuesing @philipp94831

I've added a couple additional examples here that might be useful.

  • avsc-from-external-jar: generating Java objects from a schema file that depends on schema files in an arbitrary JAR file (not produced by Gradle)
  • avsc-from-subproject: generating Java objects from a schema file that depends on schema files in a JAR produced by a different subproject within a multi-project Gradle build

Please let me know if this works for either of you, and if there are other variants that would be useful to have as examples.

davidmc24 avatar Sep 23 '21 03:09 davidmc24

Hi @davidmc24, that works like a charm! Thank you very much. I also added

    sourceSets {
        main {
            resources {
                srcDirs "src/main/avro"
            }
        }
    }

so that I don't have to rename my avro directory to resources in the subproject example. I furthermore have one question:

If I package the jar in the cat project, it also contains the compiled classes of the schema project. I wonder if there is a way to define an exclusion on the cate project. I tried

sourceSets {
    main {
        java {
            exclude 'example/Breed**'
        }
    }
}

but the java source set does not contain the classes. Do you know which part of the source set I have to reference? The classes are located in build/generated-main-avro-java but I don't know how to reference it.

Also, do you think it is an option to add an additionalSchema and testAdditionalSchema configuration to the plugin by default so we don't have to setup this dependency stuff? I think that would be a great addition

philipp94831 avatar Sep 23 '21 06:09 philipp94831

So I think I found a proper solution which let's you compile Avro schemas with dependencies but also include the artifacts in other projects. The only thing that does not work properly is caching of the generateAvroJava task. I guess this is related to the zipTree and the /tmp/expandedArchives/ folder the files are expanded to. The compileJava task on the other hand is properly cached.

import java.util.zip.ZipEntry
import java.util.zip.ZipFile

sourceSets {
    main {
        resources {
            srcDirs "src/main/avro"
        }
    }
}

configurations {
    additionalSchema
    api.extendsFrom additionalSchema
}

generateAvroJava {
    dependsOn configurations.additionalSchema
    source {
        configurations.additionalSchema.collect {
            zipTree(it)
        }
    }
}

def configureJar = tasks.register("configureJar") {
    it.doLast {
        List<String> exclusions = configurations.additionalSchema
                .findAll {
                    it.name.endsWith("jar")
                }
                .collect { File file ->
                    new ZipFile(file).entries()
                            .findAll {
                                it.name.endsWith(".class")
                            }
                            .collect { ZipEntry entry ->
                                return entry.name
                            }
                }
                .flatten()
        tasks.jar.exclude(exclusions)
    }
    // otherwise the jars of dependent projects might not have been built
    // TODO is there a way to copy the dependencies of the jar task? classes is not part of tasks.jar.dependsOn
    it.dependsOn(tasks.classes)
}

tasks.named("jar") {
    it.dependsOn(configureJar)
}

I don't know if any of this is worth integrating into the plugin itself. Thanks for your help!

philipp94831 avatar Sep 23 '21 10:09 philipp94831

Nice example. I've integrated it into the avsc-from-subproject example in the repository. Using doFirst on the jar task I was able to eliminate it.dependsOn(task.classes).

I can confirm that using zipTree appears to disable build caching, as it uses a different temp path every time. That seems like something that should just work. I'll submit a ticket to the gradle build tool project.

davidmc24 avatar Sep 23 '21 12:09 davidmc24

I also used doFirst on the jar task at first but that also broke caching. I then found this in the Gradle docs https://docs.gradle.org/current/userguide/common_caching_problems.html#custom_actions

philipp94831 avatar Sep 23 '21 12:09 philipp94831

As for whether it makes sense to include some of this by default in the Avro conventions plugin... maybe. I've gotten enough requests for ways to depend on external schema that I think it would get used. I think I need to think on it some more to figure out what variants there are, and how many (if any) new configuration options would be needed. I've added a design doc as a placeholder, as it's on the short list of new feature ideas to consider working on. But time to work on this project is scarce, so it might be a while.

davidmc24 avatar Sep 23 '21 13:09 davidmc24

I also used doFirst on the jar task at first but that also broke caching.

Good point. Fixed that based on your pattern.

davidmc24 avatar Sep 23 '21 13:09 davidmc24

Submitted https://github.com/gradle/gradle/issues/18382

davidmc24 avatar Sep 23 '21 13:09 davidmc24

As for whether it makes sense to include some of this by default in the Avro conventions plugin... maybe. I've gotten enough requests for ways to depend on external schema that I think it would get used. I think I need to think on it some more to figure out what variants there are, and how many (if any) new configuration options would be needed. I've added a design doc as a placeholder, as it's on the short list of new feature ideas to consider working on. But time to work on this project is scarce, so it might be a while.

Sure, no worries. For now, we finally have a good way to modularize our dependencies. Everything added to the plugin would simply make it easier for other users but we are fine for now :-)

philipp94831 avatar Sep 23 '21 13:09 philipp94831

Hi, just as an updated: I now wired the exclusion to the compileJava step

def configureCompileJava = tasks.register("configureCompileJava") {
    it.doLast {
        List<String> exclusions = configurations.additionalSchema
                .findAll {
                    it.name.endsWith("jar")
                }
                .collect { File file ->
                    new ZipFile(file).entries()
                            .findAll {
                                it.name.endsWith(".class")
                            }
                            .collect { ZipEntry entry ->
                                return entry.name.replaceAll(".class\$", ".java")
                            }
                }
                .flatten()
        tasks.compileJava.exclude(exclusions)
    }
    // TODO is there a way to copy the dependencies of the compileJava task? generateAvroJava is not part of tasks.compileJava.dependsOn
    // The jars of dependent projects might not have been built
    it.dependsOn(tasks.generateAvroJava)
}

tasks.named("compileJava") {
    it.dependsOn(configureCompileJava)
}

When wiring it to the jar step, the plugin generates java classes of the dependent .avsc files and uses them for compilation. However, depending on your plugin config, they might look different than the class files provided by the artifact. This way, the .avsc files are only used for avroJavaGeneration and not used for compilation.

philipp94831 avatar Sep 24 '21 08:09 philipp94831

Hi @davidmc24, we created our own plugin that wraps yours: https://github.com/bakdata/gradle-avro-dependency-plugin Our plugin makes it possible to use dependencies when compiling avro schemas. However, it would be nice to have it integrated in your plugin. So feel free to check it out and then maybe we can work out a contribution here.

philipp94831 avatar Feb 25 '22 08:02 philipp94831

Cool. I'll take a look when I have a chance.

davidmc24 avatar Feb 25 '22 15:02 davidmc24

@philipp94831 I had a chance to take a look. Looks like progress in a good direction. Thanks for bringing it to my attention.

My notes on how it looks like the plugin works:

  • configures new configurations avroImplementation/avroApi per sourceset
  • uses a per-sourceset temporary directory external-SOURCESET-avro
  • registers a number of tasks per sourceset: configureDeleteExternalJava, deleteExternalJava, configureCopyExternalAvroResources, copyExternalAvroResources
  • deleteExternalJava is used to delete .java files that are associated with .class files that are in external jars. The assumption appears to be that such classes have already been generated elsewhere and we don't want to include them in the current component's JARs.
  • configureDeleteExternalJava is a separate task because deleteExternalJava is using a general-purpose Delete task (and thus needs to be told what to delete), the logic to determine what to delete requires resolution of dependencies, and we don't want to perform resolution of dependencies at configuration-time to avoid slowing down the build when the task isn't called.

I looked at https://github.com/gradle/gradle/issues/18382 again, and think it's possible we could avoid having a temporary directory. Would you mind taking a look to see if, in your plugin, loading from a zipTree works with build caching if the source isn't the zipTree directly, but instead zipTree.files? If so, maybe we could avoid the need for the configureCopyExternalAvroResources and copyExternalAvroResources tasks?

I'm concerned that deleteExternalJava is potentially invalidating the cache-ability of the generateAvroJava task, by modifying its output directory. I think a better way to handle this would be to add a custom action to the generateAvroJava task that performs the operations currently performed by both configureDeleteExternalJava and deleteExternalJava. With that approach, the cache key for the generateAvroJava output won't be calculated until after the action completes. If we end up merging your plugin into mine, it could potentially just be an action within the task itself.

maybe we can work out a contribution here

I'm open to including this capability in the main plugin, based on the approach you've demonstrated. The main constraint is that I don't have a lot of time to spend on this plugin these days. If you're interested in porting your code to Java and submitting a pull request, I'll make sure to make time to review and take it the rest of the way. Otherwise, it may wait a while until I have enough time to do so myself.

davidmc24 avatar Mar 01 '22 05:03 davidmc24

Hi @davidmc24, your suggestion wrt ZipTree seems to be working. Also the delete task does not seem to break caching. Here is a PR I am currently working on https://github.com/bakdata/gradle-avro-dependency-plugin/pull/2. Maybe you can have a look if it resolves your concerns.

The purpose of deleteExternalJava is exactly what you described

philipp94831 avatar Mar 01 '22 07:03 philipp94831