dgraph4j
dgraph4j copied to clipboard
Build shadow jar relocating dependencies
Fixes #84
Todo -
- [ ] Publish shadow jar on maven central
- [x] Shadow grpc dependency => Not doing it, this could cause issues because the project using Dgraph4J will have a direct dependency on grpc libraries, given that it is create channels using ManagedBuilder etc.
- [x] Reduce the size of shadow jar => it is 13 MB now
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @gitlw and @mangalaman93)
build.gradle, line 57 at r1 (raw file):
shadowJar { classifier = 'shadow' relocate 'com.google.common', 'io.dgraph.shaded.com.google.common'Thanks for this PR. I agree that it's a good idea in some cases to have a single fat jar so that executing the code is easier.
The other piece is for relating the library. You mentioned this change is to resolve version conflict in guava. But since we don't depend on guava directly, I feel it's not well justified that we do this relocation.
In the original github issue, the problem was to avoid version conflict on grpc. I feel a better way to resolve this is to specify a minimum grpc version in this repo, and if another project using dgraph4j also depends on grpc, the resolved grpc version should automatically be bumped to the higher on using the default resolution strategy. If it turns out our code does not work well with the latest grpc for some reason, then we should raise the minimum version and fix our code here accordingly. That process would eventually deprecate the old versions of grpc as we move forward.
I agree that this is not ideal solution, but I think the solution that you propose may not always work. For example, in this case, currently we need the guava version 21 whereas spark 2.4 wants an older version, 16.0 of guava. Dgraph4J currently doesn't work with the 16.0 version. Even when we specify that we want a minimum version of 21, it doesn't solve the problem here.
Even when we make it work with 16.0 version, I don't think this would always work and will result in one or the other dependency causing issues. For example, issue with grpc was reported recently whereas spark doesn't work with guava and so on.
And, as much as it is not ideal, it doesn't seem uncommon to do so. In fact, I observed shaded version of netty jar, in the Dgraph4J fat jar probably coming from dependencies. Build tools allows pulling shaded version of a dependency that are hosted by various repos (maven central, sonatype etc.).
We can possibly find the right shaded jars to import for direct dependencies etc. that leads to the jar with no conflicts. But I think it would be a lot of efforts to find the right combinations of dependencies, and we may not find shaded jars for the direct dependencies that we need.
I see in the original comment that this will not shade GRPC but a later comment says it will (in a future commit) -- is that still the case? For what it's worth, I'm shading:
relocate 'com.google.common', 'com.example.spark.dgraph.com.google.common'
relocate 'com.google.protobuf', 'com.example.spark.dgraph.com.google.protobuf'
relocate 'io.grpc', 'com.example.spark.dgraph.io.grpc'
relocate 'io.opencensus', 'com.example.spark.dgraph.io.opencensus'
although I'm not sure I still need to shade all of this separately.
As of now, we don't release any shaded library. The problem in us releasing the library and specifically shading grpc is that, the users are expected to import grpc as well. Now the shaded library would have different import path than as that of the import in the users' application code and both imports would become incompatible and cannot work together. That is why this issue is still open.