rules_scala icon indicating copy to clipboard operation
rules_scala copied to clipboard

Strict scala deps and unused_deps

Open ittaiz opened this issue 7 years ago • 31 comments

So, I met with @cgrushko a few days ago and apparently the way we (rules_scala) currently do strict deps (allowing a target to only use what it directly depends on) is very different from the way java rules do it and IMHO not as good.

Problem: We currently pass to scalac only the direct jars and let scalac emit an error to the user. These errors are quite often cryptic and unuseful for developers to be able to easily fix the problem. Moreover because of how the scala compiler works there are many cases where it needs transitive dependencies even if they aren't mentioned in the source code of the unit we're compiling[1]. This leads to having to add many transitive targets without any reasoning just to appease the compiler or worse to start using exports loosely around the codebase.

We suffered from the above transitive dependencies so much that we're just adding the whole transitive closure to the deps of a target. This is in the onboarding of a project so we're ok with it but this solution can help greatly minimize it.

Proposed solution: Pass direct and indirect jars to scalac's classpath (like java compilation does).
Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

We will also support a lenient mode where this error is outputted as a warning. add_dep is a command which buildozer can take and apply it to your BUILD file. I think (not sure) that ibazel can pick up these warnings and automatically apply them to your BUILD file. Probably rebazel will be able to do this as well if they'll want to.

Value: Separation of concerns (scalac won't enforce graph concerns) Allows higher level messages and so call to actions Might allow finer grain modeling of when indirect dependencies are ok (for an example where scalac "messes" up but we might be able to improve see this repo)
Better cycle for developers since they can run with lenient mode and iterate quickly and fix the deps before pushing (or use ibazel/rebazel). The caveat is obviously the fact people can have a local green build which fails on CI.

Open questions: It's unclear how smart the compiler plugin can be given scalac's limitations and so unclear on how much noise will still be added to a target's deps.

Small implementation detail: To do this we'll need to pass to the compiler plugin a list of direct dependencies and a list of indirect dependencies where at least the latter needs to have both jars and the targets so that the error/warning message can be useful (add //src/main/foo:baz to your deps)

[1]Improve user experience for builds that use only direct dependencies

ittaiz avatar Jun 28 '17 06:06 ittaiz

cc some people who are/were involved with thinking about this at Wix (@ors @natans @nadavwe @shvar @dkomanov)

ittaiz avatar Jun 28 '17 06:06 ittaiz

this seems really nice.

note related post on bazel blog for java https://blog.bazel.build/2017/06/28/sjd-unused_deps.html

The strategy here looks good to me. Some comments:

  1. can we get the reliability so this is the default, or will we need to be opt in for a while (I imagine the latter).
  2. should we have a version that just warns on unused dependencies first (kind of the reverse problem as this).
  3. I'm surprised the strict dependencies became a show-stopper for you. Our dependencies are probably over-broad, but it hasn't been debilitating.

johnynek avatar Jun 29 '17 02:06 johnynek

The blog post and the issue are both products of Carmi and me meeting a few days ago :) Some of the wording in the issue are copied from the explanation of how the plugin works.

Re #1 I think having "dumb" strict mode via Bazel and not scalac should be easy. By "dumb" I mean we warn/error for every indirection. What might be more complicated is the "smarter" mode where we are able to identify which indirect usages are legitimate (compiler's limitations) and which aren't (you're just using that dep yourself)

Re #2 honestly that's not the problem I need to solve now. Many of our generated BUILD files have ~100 dependencies in their "deps". If I get strict mode with warnings then I'm able to add that into the migration phase and output minimal BUILD files. Unused is for longer down the road where you want to keep BUILD files cleaner and IINM the current convention in Google is to move the unused dependencies to runtime under the assumption something might need it.

Re #3 after I added around 30 transitive targets without understanding why the compiler is angry I decided to automate it. Note that it wasn't all of them. For this one repo I'd guess we're talking about several hundreds of noise dependencies. On Thu, 29 Jun 2017 at 5:06 P. Oscar Boykin [email protected] wrote:

this seems really nice.

note related post on bazel blog for java https://blog.bazel.build/2017/06/28/sjd-unused_deps.html

The strategy here looks good to me. Some comments:

  1. can we get the reliability so this is the default, or will we need to be opt in for a while (I imagine the latter).
  2. should we have a version that just warns on unused dependencies first (kind of the reverse problem as this).
  3. I'm surprised the strict dependencies became a show-stopper for you. Our dependencies are probably over-broad, but it hasn't been debilitating.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-311841849, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF0si-kg_1lWXezRPcevNm5oLGICOks5sIwaJgaJpZM4OHkDK .

ittaiz avatar Jun 29 '17 04:06 ittaiz

@gkossakowski Since Oscar pulled you to the PR I thought I'd pull you to the issue and ask you if you have an intiuition or an idea on how "smart" will the plugin be able to be? What we really want to enable is to warn/error the user when their code uses a transitive dependency. This is to allow for upstream targets freedom in changing implementation decisions. I don't know how feasible this distinction (I use vs compiler needs for transitive reasons) is in the compiler/compiler-plugins.

Thanks!

ittaiz avatar Jul 14 '17 10:07 ittaiz

@gkossakowski Do you have an intuition or an idea on how we can advance the "smart" plugin mode? What we really want to enable is to warn/error the user when their code uses a transitive dependency as opposed to warning the user for cases where the compiler needs this symbol lookup just for the its own needs. This is to allow for upstream targets freedom in changing implementation decisions.

Thanks!

ittaiz avatar Oct 03 '17 04:10 ittaiz

Yes, I mulled over this problem in the context of zinc's incremental compilation decision. In zinc, you want to approximate the dependencies scalac would ever need.

The short answer is: the dependencies introduced by reference do not require a transitive closure but the one introduced by inheritance does require a transitive closure. When a program mentions a symbol, e.g. val x: Foo then it introduces a reference dependency. When program mentions a dependency in the context of declaring a parent type for a class, e.g class X extends Y[Foo], then it introduces a dependency by inheritance.

You can capture both kinds of dependencies by traversing the trees and collecting the information in the right context. Zinc implements this: https://github.com/sbt/zinc/blob/1.x/internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala#L326

gkossakowski avatar Oct 03 '17 05:10 gkossakowski

Are you sure? I have an example which has a class A with a member B and B has a member C (which is a java interface) and you can't compile A without C. Tested with 2.11.7 and 2.12.1 at the time

ittaiz avatar Oct 03 '17 05:10 ittaiz

Actually I think that my example above shows that there are more areas the plugin can improve in but what you wrote IIUC is that at least in that aspect (of inheritance) the plugin can be smarter and not warn/error. Did I understand you correctly?

ittaiz avatar Oct 03 '17 05:10 ittaiz

My comment is what should happen given Scala's typesystem design to shed light what directionally to expect. From there you can derive whether the behavior you're observing is a bug or a design choice you have to work with.

I just checked that what I said about dependencies by reference not forcing transitive dependencies is true for pure Scala:

[gkk@mbp ~/tmp/scala-tmp]$ echo "class A" > A.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class B {
>     def a: A = new A
> }
> " > B.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class C {
>     def b: B = new B
> }
> " > C.scala
[gkk@mbp ~/tmp/scala-tmp]$ mkdir outA outB outC
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outA A.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outB -cp outA B.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC C.scala # skip -cp to show that classpaths are isolated
C.scala:2: error: not found: type B
    def b: B = new B
           ^
C.scala:2: error: not found: type B
    def b: B = new B
                   ^
two errors found
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC -cp outB C.scala # the A class is not on the classpath

I'm not sure what's different about your example but I think it should work. Perhaps you can check whether you can reproduce the problem in pure Scala setting? I suspect that loading Java-originating symbols from a class file is overeager. It's a different code path than loading Scala symbols in the compiler.

gkossakowski avatar Oct 04 '17 18:10 gkossakowski

Would the use-case where a scala_test target is required to redeclare the external dependencies of it's scala_library fall into this issue -- or would that be something else?

I have built a contrived example here: https://github.com/promiseofcake/bazel-scala-test/pull/2 that runs with Travis to show the current issue.

I think it belongs here because for our java targets we have not had to perform similar adjustments.

promiseofcake avatar Oct 17 '17 22:10 promiseofcake

Yes. This is under the purview of this issue.

ittaiz avatar Oct 18 '17 04:10 ittaiz

Regarding the newly documented strict-deps,

Note: Currently strict-deps is protected by a feature toggle but we're strongly considering making it the default behavior as java_* rules do.

If it were to become the default behavior, would we want to follow the proposed solution before making it mandated?

promiseofcake avatar Nov 09 '17 17:11 promiseofcake

Hi Lucas, First I'd like to apologize for not responding previously, I somehow missed your reply. When you say "the proposed solution" you mean your proposed solution of "whitelisting" the scalatest/scala library dependencies? If so I agree we should. Can you open an issue about it? If you meant something else I'd appreciate the clarification On Thu, 9 Nov 2017 at 9:37 Lucas Kacher [email protected] wrote:

Regarding the newly documented strict-deps,

Note: Currently strict-deps is protected by a feature toggle but we're strongly considering making it the default behavior as java_* rules do.

If it were to become the default behavior, would we want to follow the proposed solution before making it mandated?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-343232004, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIFwUNp2bp4KAQdctKA5os_ivpkzK3ks5s0zhHgaJpZM4OHkDK .

ittaiz avatar Nov 09 '17 18:11 ittaiz

No worries @ittaiz, the proposed solution I was referring to was:

Pass direct and indirect jars to scalac's classpath (like java compilation does). Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

With the gist of it being that dependency resolution could perhaps mimic the way it works for java targets? Here is another example: https://github.com/promiseofcake/pubref-java-proto-scala/pull/1 which shows that exports from java_ targets aren't pulled in for scala_ rules, and I am fairly certain those targets are exported because the --strict_java_deps=ERROR does not error out for the java_ targets.

promiseofcake avatar Nov 09 '17 18:11 promiseofcake

Ok, This proposal is implemented (minus the smart mode which we actually started working on yesterday). It's very possible that we have bugs like mandating the scala test jars or not supporting exports properly. The best solution IMHO is for other organizations to start using it and flush out problems which we'll fix. For each of those we should probably have an issue and a repro and we'll try to iron them out. On Thu, 9 Nov 2017 at 10:05 Lucas Kacher [email protected] wrote:

No worries @ittaiz https://github.com/ittaiz, the proposed solution I was referring to was:

Pass direct and indirect jars to scalac's classpath (like java compilation does). Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

With the gist of it being that dependency resolution could perhaps mimic the way it works for java targets? Here is another example: promiseofcake/pubref-java-proto-scala#1 https://github.com/promiseofcake/pubref-java-proto-scala/pull/1 which shows that exports from java_ targets aren't pulled in for scala_ rules, and I am fairly certain those targets are exported because the --strict_java_deps=ERROR does not error out for the java_ targets.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-343240136, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF6bqKfXJADrCimr9DujLxqbXP4PJks5s0z70gaJpZM4OHkDK .

ittaiz avatar Nov 09 '17 18:11 ittaiz

Pass direct and indirect jars to scalac's classpath (like java compilation does).

Is this implemented? I'm running at current tip, and am getting errors that seem to indicate that it isn't: I get a term not found error that I can resolve by manually adding transitive dependencies to the target.

kamalmarhubi avatar Dec 10 '17 08:12 kamalmarhubi

This is implemented. Are you running the build with --strict_java_deps flag? You can use error or warn as the relevant flags. This isn't the default behavior yet since we're still ironing out some kinks (like warning on scala-library). If you encounter issues I'd really appreciate opening a new issue and linking it to this one so we can have more scoped discussions.

ittaiz avatar Dec 10 '17 08:12 ittaiz

Ah, I didn't realize I had to pass that flag. That does indeed fix it, and allows me to remove the extra dependencies (or equivalently the exports on the direct dependencies). Thanks!

(There's some weirdness with the error messages mentioning unknown labels, but I'm pretty sure I saw that in another issue.)

kamalmarhubi avatar Dec 10 '17 09:12 kamalmarhubi

Are you using scala_import for your dependencies? Currently the only rules that propagates labels correctly are scala_import and scala_* (library, binary, etc). If for example you have a scala_library (a) which depends on java_library (b) which in turn also depends on java_library (c) and a uses c without declaring it the labels will be off. We use only scala_library and scala_import so this specific thing is less of a blocker for us. I hope to tackle it in the next few months but it's not a blocker to making it the default since we're talking about info you also don't have today

On Sun, Dec 10, 2017 at 11:01 AM Kamal Marhubi [email protected] wrote:

Ah, I didn't realize I had to pass that flag. That does indeed fix it, and allows me to remove the extra dependencies (or equivalently the exports on the direct dependencies). Thanks!

(There's some weirdness with the error messages mentioning unknown labels, but I'm pretty sure I saw that in another issue.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-350534394, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIFyiIy5_TmdspvtqnrBH4sHaqsKT_ks5s-53ngaJpZM4OHkDK .

ittaiz avatar Dec 10 '17 09:12 ittaiz

@ittaiz

Are you using scala_import for your dependencies?

I'm experimenting with java_import_external from bazel_tools, so no. However it's possible to decipher the message in the meantime.

Thinking on how to solve this: it seems like it should be possible to apply an aspect to the deps of scala_* to collect this info. Is that the approach you were considering?

kamalmarhubi avatar Dec 10 '17 16:12 kamalmarhubi

No actually. The current direction (which I Haven't gotten around to but you're welcome to) is to generalize java_import_external to a jvm_import_external which will accept either scala_import (and we will expose here as scala_import_external) and also java_import_external as today. See this discussion for agreement with @dslomov on this direction

ittaiz avatar Dec 10 '17 17:12 ittaiz

As far as I know, strict mode for rules_scala does not yet catch extra dependencies. @ittaiz discussed writing a compiler plugin to deal with this part.

I ran across this compiler plugin from the Scala Center https://github.com/scalacenter/classpath-shrinker which seems to do what we are interested in doing.

I played around with using the plugin to clean up some BUILD files that were auto generated using sbt -> pom -> Bazel: https://docs.bazel.build/versions/master/migrate-maven.html

I ran into an issue where all of the transitive dependencies are on the classpath at compile time, so if you have scala targets that depend on other scala targets, then the compiler plugin prints a bunch of warnings.

I'm curious if this plugin has been explored before or if any work has been done on detecting extra dependencies.

jjudd avatar Mar 07 '18 02:03 jjudd

Well, I feel silly. I didn't read closely enough. Looks like that plugin is already in rules_scala :D https://github.com/bazelbuild/rules_scala/blob/master/third_party/plugin/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala

I'm still curious if the original plugin detected extra jars if rules_scala can do that. I'll dive in a bit more. Sorry to waste anyone's time.

jjudd avatar Mar 07 '18 03:03 jjudd

Okay, I quickly hacked together a really rough implementation of unused_deps in #438 that is bundled with strict_java_deps. I would like to get people's feedback before going any further.

jjudd avatar Mar 07 '18 06:03 jjudd

Will there be a way in the future to use both unused_deps and strict_deps at once? Currently readme suggests not

Jamie5 avatar Oct 03 '18 19:10 Jamie5

It would be nice... that said, they work across purposes a bit.

At Stripe, we minimize the compiler classpath to reduce cache invalidation. So, we don't use strict_deps (which actually puts all your transitive deps on the classpath, but then errors if you use them). Instead, we just don't have them, and we get errors (although harder to diagnose) if you use them.

Really, I start to wonder if compiler plugins are the right path here. As long as scalac is not trying to do separate compilation, the experience will always be a bit second rate. I'd like to see separate compilation with minimal classpaths be a supported feature of scalac, but right now it is kind of tolerated, but not really strongly supported.

johnynek avatar Oct 03 '18 19:10 johnynek

We've been using both together at Lucid for a few months with Annex and it has worked quite well. Zinc's analysis is used to determine unneeded and unused deps and outputs nothing, warnings, or errors with buildozer commands depending on the configuration. Combined with ibazel's run_output and run_output_interactive, you can get an interactive N/y prompt to execute a buildozer command to remove or add a dependency when it violates strict/unused deps. This has made it pretty easy to maintain minimal classpaths.

jjudd avatar Oct 03 '18 20:10 jjudd

I hope/plan to implement Twitter's solution which AFAIU goes something like this: Use a minimal classpath (like the default one in rules_scala today). Have something (ibazel?) that sees the compilation failures Knows how to extracts from it the class name (class name might be too simple) Sends it to an index of class names to labels which returns a response If it's one label it adds it to the relevant target (like the buildozer one)

I think the hard parts are the index (we'll probably have that ready in a couple of weeks) and extracting a class name from the cryptic compiler errors.

WDYT?

On Wed, Oct 3, 2018 at 11:00 PM James Judd [email protected] wrote:

We've been using both together at Lucid for a few months with Annex and it has worked quite well. Zinc's analysis is used to determine unneeded and unused deps and outputs nothing, warnings, or errors with buildozer commands depending on the configuration. Combined with ibazel's run_output and run_output_interactive, you can get an interactive N/y prompt to execute a buildozer command to remove or add a dependency when it violates strict/unused deps. This has made it pretty easy to maintain minimal classpaths.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-426779756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF4C646e2OzYLgcXvrLYs6e9PDN-zks5uhRdhgaJpZM4OHkDK .

ittaiz avatar Oct 03 '18 20:10 ittaiz

There would need to be one index per scala_ target, right? The index would contain the mapping of class to label for the transitive closure of the current target? If it is global, it would be constantly churning and class name collisions could cause problems.

jjudd avatar Oct 03 '18 21:10 jjudd

I don't think we're talking about the same thing. The index is of class-name to labels that it belongs to (meaning it was in the sources of that target) (multiple labels can declare the same FQN). If the FQN belongs to one label it's easy. Otherwise the user needs to decide.

On Thu, Oct 4, 2018 at 12:07 AM James Judd [email protected] wrote:

There would need to be one index per scala_ target, right? The index would contain the mapping of class to label for the transitive closure of the current target? If it is global, it would be constantly churning and class name collisions could cause problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-426803519, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF7KQeQimd1PTPKFCcXAj4-Hf9yRdks5uhScTgaJpZM4OHkDK .

ittaiz avatar Oct 03 '18 21:10 ittaiz