bazel-deps icon indicating copy to clipboard operation
bazel-deps copied to clipboard

How to make bazel-deps Maven specific, rather than language specific?

Open pauldraper opened this issue 6 years ago • 10 comments

The goal of bazel-deps is "consume Maven from Bazel".

Language

Buts lots of ecosystems besides just Java use Maven: Java, Scala, Kotlin, Closure, Groovy, JRuby, etc. Each has its own nuances. For example, Scala's deeply integrated macro system means that macro cannot use Java's ijars. And Java's annotation processors have no relevance to Scala. What will work for Java users of bazel-deps does not work for other users of bazel-deps.

Rules

Even if Java was the only one language, there are currently several rules: Google itself has authored http_jar, maven_jar, and java_import_external.

Lucid has a WIP fork and scala_import_exernal.

Preferences

Even if there only one language and one rule, there are strong differences of opinion on things like using bind(), or using exports vs. runtime_deps vs. deps.

There seems to be an unending number of variations people want for managing external Bazel dependencies.


Interested in thoughts on this, but bazel-deps should strive to be bazel-maven-deps, rather than bazel-java-scala-kotlin-deps.

The most important thing bazel-deps does is provide list_dependencies(). As long as it providers sufficient information (artifact ids, url, sha256, srcjars, etc.), consumers can do whatever they want: rules, prefixing, binding, dep-ing, etc. Some people want sources for IDEs, others couldn't care less.

Suggestions

  1. Add additional (still Maven-related) information list_dependencies(): urls, source jars, and sha256.

  2. Allow for extra information like macros, annotation processing, etc. to be provided dependency.yml as is attached arbitrary data.

  3. Document the format of list_dependencies() as a primary API. (Also, switch dicts to more aesthetically appropriate structs.)

  4. Remove most non-Maven info from dependencies.yml: headers, prefix, language. The important things are dependencies, versions, resolvers, exclusions, and conflict management.

  5. Control language/rule/preference specifics from Skylark, outside dependencies.yml. For example, maybe there's a out-of-the-box semi-opinionated bazel-deps implementation that looks like

maven_repositories(
    maven_list_dependencies(),
    deps = "runtime_deps",
    prefix = "my_prefix_",
    replacements = {
    },
    ...
)

(1) is the "necessary" additions to e.g. use java_import_external.

(2), (3), (4), and (5) promote clean design and separation of responsibility, but are not strictly necessary to achieve anything I've mentioned.

pauldraper avatar Apr 26 '18 14:04 pauldraper

Thanks for the interesting suggestions. I do want to improve the list_dependencies API in many of the ways you mention. It would probably be good to make a design doc of what it should be. One use case I have for it is composing remote repos without having to merge the dependencies yml.

I don’t agree that language doesn’t matter because scala is a main use case for me, and without awareness of scala’s multiplexing of maven things get pretty ugly: we’d have to do find and replace of scala versions on major updates, and support for cross builds becomes painful. I’m open to similar support for other languages as needed, so I disagree on that direction.

I’m open to a PR that could move the 3rdparty generation into a repository rule, but writing that in skylark seems like such a pain I personally will probably never get to it.

I’m open to PRs. Maybe easiest to start with the small things (adding sha256, urls and srcjars to the list_dependencies, writing a design doc on the structure of that list).

johnynek avatar Apr 26 '18 15:04 johnynek

One more note: “There seems to be an unending number of variations people want for managing external Bazel dependencies”

Yes, but I definitely don’t want to make the most flexible tool. This tool has some opinions that have worked well for 2 years at Stripe. This tool is not one of the things we complain about with bazel, so I don’t want to take things in the direction of supporting more different ways to do things in general at the expense of more configuration. I hope the community can not have each bazel repo be a special snowflake and ideally all repos using bazel-deps could easily depend on each other. So I guess you can say cross repo compatibility is much more important to me.

johnynek avatar Apr 26 '18 16:04 johnynek

Yes, but I definitely don’t want to make the most flexible tool. I don’t want to take things in the direction of supporting more different ways to do things in general at the expense of more configuration

Indeed. In fact, I believe if anything the tool is too configurable in the sense that dependency.yml has a lot of options.

I hope the community can not have each bazel repo be a special snowflake and ideally all repos using bazel-deps could easily depend on each other.

I hope that as well (I'm currently converting https://github.com/lucidsoftware/rules_play_routes and https://github.com/lucidsoftware/rules_twirl to use maven deps), though I used to be more optimistic.

without awareness of scala’s multiplexing of maven things get pretty ugly: we’d have to do find and replace of scala versions on major updates, and support for cross builds becomes painful.


Yep, that's nice safe place to start. As I mentioned, we have that working on a fork, but that was pre-couriser, so it will take some work.

I’m open to a PR that could move the 3rdparty generation into a repository rule, but writing that in skylark seems like such a pain I personally will probably never get to it.

Are you thinking transitive_maven_jar, when it's generated automatically? Repository rules can execute arbitrary commands, so it's not infeasible.

  • Con: There are hypothetical correctness issues with not using a checked-in lock file, in that jars can be published over previous versions, and maven supports range dependencies. In practice those don't materialize. Deps must be re-resolved for each user (before coursier, very slow).
  • Pro: Very easy. And end users of a library repo can easily choose different versions (e.g. resolve all the Scala x.xx.xx artifacts).

pauldraper avatar Apr 27 '18 00:04 pauldraper

well, I mean you check in the transitive resolution into a .bzl file, and then you use a repository rule to write the structure of the 3rdparty directory. So, it is just as safe as now, but arguably less discoverable since you need to know where the repository rules are materialized. I would not want the repo rule to do any resolving.

I don't really know all the details of how transitive_maven_jar works (how does it normalize the versions to a single version for each jar, etc...)

What options would you suggest we remove from dependencies.yml? There are configuration options, but the build header stuff is because we had to have it: we didn't want to just use the standard rules (we want different default flags). Maybe proper toolchain support can remove the need for that. Also, originally, we built some repos using transitive external classpaths rather than only on the runtime path. I'm still not sure the right way to go there. In practice, external dependencies change so rarely, the cache invalidation argument for thin classpaths probably goes away.

johnynek avatar Apr 27 '18 00:04 johnynek

Three small points:

  1. Will one have conflicts if there is a header which loads scala library and also this happens in a prelude?
  2. I think recursive workspaces is positioned to solve this issue of resolution with correctness.
  3. Oscar- a repository rule can run a java binary though it can’t be one which is built by this repository On Fri, 27 Apr 2018 at 3:38 P. Oscar Boykin [email protected] wrote:

well, I mean you check in the transitive resolution into a .bzl file, and then you use a repository rule to write the structure of the 3rdparty directory. So, it is just as safe as now, but arguably less discoverable since you need to know where the repository rules are materialized. I would not want the repo rule to do any resolving.

I don't really know all the details of how transitive_maven_jar works (how does it normalize the versions to a single version for each jar, etc...)

What options would you suggest we remove from dependencies.yml? There are configuration options, but the build header stuff is because we had to have it: we didn't want to just use the standard rules (we want different default flags). Maybe proper toolchain support can remove the need for that. Also, originally, we built some repos using transitive external classpaths rather than only on the runtime path. I'm still not sure the right way to go there. In practice, external dependencies change so rarely, the cache invalidation argument for thin classpaths probably goes away.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/johnynek/bazel-deps/issues/153#issuecomment-384829331, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF2e17_h320EsQTa-u6xCgVXpxpNvks5tsmh9gaJpZM4TlQnd .

ittaiz avatar Apr 27 '18 04:04 ittaiz

What options would you suggest we remove from dependencies.yml?

licenses, transitivity, namePrefix, and replacements can all be done post dep resolution, as Skylark parameters. And that accomodates, e.g. I need a custom value of licenses for a few artifacts. Or I want to use java_import_external with a dep transitivity mode. Or have a complex replacement requirements.

When we "use a repository rule to write the structure of the 3rdparty directory", buildHeader and thirdPartyDirectory can be replaced.

The more that is in Skylark, the less complexity and tweaks have to be put into the shared bazel-deps project.

I think recursive workspaces is positioned to solve this issue of resolution with correctness.

I'm not sure I understand, though I have not read the proposal.

pauldraper avatar Apr 27 '18 14:04 pauldraper

@ittaiz yes, we could run anything with a repository rule that jumps through enough hoops, but you widen the part of the build that is not hermetic. Since we currently check everything in, nothing needs to run to build except the normal (hermetic, assuming downloads with shas are hermetic) process.

@pauldraper all of your suggestions presume a different a different style of a tool. Note, all the things you talk about are done post resolution, but we don't currently have a mechanism to materialize them separately. As far as licenses, I probably shouldn't have merged that PR. We don't use this field at Stripe, but clearly a license should apply to a target, not to everything in bulk.

I hear you that you have a different vision for the tool that I think I don't agree with. Perhaps you and @ittai actually share more of a common vision. I think he has a tool as well, but I don't know if it is open source. You might want to work with him on his.

johnynek avatar Apr 29 '18 00:04 johnynek

Just so it's clear- our tool does resolution ahead of time just like bazel-deps. We might change it when/if recursive workspaces arrives since it sounds like it will allow advantages from both sides. I was mixing things up when I mentioned the running of binary. Sorry.

On Sun, Apr 29, 2018 at 3:46 AM P. Oscar Boykin [email protected] wrote:

@ittaiz https://github.com/ittaiz yes, we could run anything with a repository rule that jumps through enough hoops, but you widen the part of the build that is not hermetic. Since we currently check everything in, nothing needs to run to build except the normal (hermetic, assuming downloads with shas are hermetic) process.

@pauldraper https://github.com/pauldraper all of your suggestions presume a different a different style of a tool. Note, all the things you talk about are done post resolution, but we don't currently have a mechanism to materialize them separately. As far as licenses, I probably shouldn't have merged that PR. We don't use this field at Stripe, but clearly a license should apply to a target, not to everything in bulk.

I hear you that you have a different vision for the tool that I think I don't agree with. Perhaps you and @ittai https://github.com/ittai actually share more of a common vision. I think he has a tool as well, but I don't know if it is open source. You might want to work with him on his.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/johnynek/bazel-deps/issues/153#issuecomment-385216202, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF5Q4Mc-2VHoyd8hIim1pjyWY3Njkks5ttQ1kgaJpZM4TlQnd .

ittaiz avatar Apr 29 '18 03:04 ittaiz

@johneynek, I don't think anything I suggested is very different (I was not proposing transitive_maven_deps...just trying to understand your suggestion).

"Use a repository rule to write the structure of the 3rdparty directory [post-resolution]", is what you brought up, and I agree it would be helpful.

And removing the other fields is not necessarily a requirement; 'twas a suggestion to simplify the surface of the tool and factor out orthogonal concepts. But at the end of the day, list_dependencies() can still be used directly.

The only thing that's a must have is sha256, srcjars, urls, which are necessary for java_import_external. And it sounds like those are agreeable additions.

pauldraper avatar Apr 29 '18 03:04 pauldraper

Totally happy to iterate on the skylark api here and improve what lis_dependencies gives you. Right now, for instance, it doesn’t give you the graph. This would be needed to generate in skylark.

johnynek avatar Apr 29 '18 05:04 johnynek