Flesh out support for shading third party libraries library and add example docs (500USD Bounty)
From the maintainer Li Haoyi: I'm putting a 500USD bounty on this issue, payable by bank transfer on a merged PR implementing this.
We need to be able to depend on shaded third-party libraries, have the original library properly excluded from the runClasspath, and instead replaced by the shaded classfiles. Right now we can shade stuff in assembly using AssemblyRules, but shading should also apply to:
runjar(which should include the shaded dependency)publishLocal/publishAll(which should publish jars containing the shaded classes transitively and no dependency on the original and/or an exclusion),runClasspath(e.g. if someone wants to use the classfiles in aJvm.runSubprocessorJvm.runClassLoaderit should exclude the original and include the shaded classes)
There's some design space here to explore.
Should have an example under javalib/dependencies.adoc for shading Java using jarjar, scalalib/dependencies.adoc using https://github.com/eed3si9n/jarjar-abrams, maybe something for kotlin
We already have a dependency on jarajar-abrams to provide the Relocate assembly rule. https://github.com/com-lihaoyi/mill/blob/b66ef93e3ca926f480555040c9726814662cc97e/scalalib/src/mill/scalalib/Assembly.scala#L81
I think this is a broader topic than just assembly. For example, if I shade an upstream library, I should be able to publish to Maven Central and bundle the shaded library, with the normal library <dependency> metadata removed. Same as if I shade a library and a downstream module in the same build depends on me
The dependency going to be shaded should be declared as compileIvyDeps or compileModuleDeps.
But compile*Deps is not sufficient: all that does is ensure the originsl classfiles are not included in the jar, which is correct, but i also want the shaded classfiles included somehow in the jar (not assembly) so it can be used at runtime.
hi team,
Could you please help me understand if this use case falls under the scope of the bounty?:
the spark-excel package is relying on org.apache.poi:poi-ooxml and a shaded version of org.apache.commons:commons-compress. The poi-ooxml also depends on commons-compress
assembly rule:
def assemblyRules = Seq(
Rule.Relocate("org.apache.commons.compress.**", "shadeio.commons.compress.@1")
)
However, at runtime we still see the poi-ooxml library referring to the unshaded commons-compress which already exists on the system (part of built-in databricks runtime and can't be changed).
Would this bounty correctly handle the above scenario? If so I may be willing to add to the bounty.
@neontty yes your use case is exactly that of the bounty. If you have a need for this would love your help implementing it!
Excellent! This would greatly benefit the users of the crealytics spark-excel package. Let me discuss with my coworkers.
After looking around, I think sbt-shading has a nice approach. Shaded dependencies are included in libraryDependencies (mvnDeps) as well as specified in shadedDependencies.
shadedDependenciesand all their transitive dependencies (except for those also brought in by non-shaded dependencies) are included in the published jar- relocate rules are supported, using
jarjar-abrams - the
<dependency>metadata removed, replaced by a comment - a
validNamespacessetting have to be specified. Classes in the output jar have to be in these namespace. I think this is a nice feature. Main purpose, I think, would be to remind users to write proper relocate rules for shaded transitive dependencies. For example we shadecommon-compresswith the relocate ruleorg.apache...->mylib.shaded.org.apache...andvalidNamespacesmylib. Howevercommon-compressalso depends on aabcdependency with classesabc.def.... An error would be raised that reminds us to include a relocate rule forabc(abc.def->mylib.shaded.abc.def) as well.