rules_jvm_external icon indicating copy to clipboard operation
rules_jvm_external copied to clipboard

Add support for pinning snapshots in gradle resolver

Open acecilia opened this issue 6 months ago • 9 comments

Hi 👋

This PR adds support for snapshot dependencies in the gradle resolver. It uses the repository timestamp of the snapshot as a version. I added this information in a new field with a generic name VersionRevision.

Related: https://github.com/bazel-contrib/rules_jvm_external/pull/974

acecilia avatar Jul 11 '25 00:07 acecilia

Thank you for the PR. I think there's a couple of things that I think about as I read this:

  1. Snapshots aren't always stored with a version. Sometimes, the -SNAPSHOT.jar is just silently replaced
  2. We should have a consistent story for handling snapshots throughout the ruleset

You are correct that some maven repos do write snapshots versions like this, and that's what makes things tricky. I suspect that the correct fix will be to:

  1. Identify if a dep is a snapshot in some way. I think a boolean field is enough for that, and we can add it only for snapshots. For versioned snapshots, we might not need to do this, as I think they get stored in maven repos as the url expansions expect.
  2. When we create the lockfile, omit the sha256 of any snapshot jar that doesn't have a fixed version (likely still contains SNAPSHOT.jar in the file name). That means that if the version is updated in place, builds won't break unexpectedly.

What do you think?

shs96c avatar Jul 11 '25 09:07 shs96c

Thanks for the feedback. Let me ask for some more context to understand a bit better your thoughts 😄

Snapshots aren't always stored with a version. Sometimes, the -SNAPSHOT.jar is just silently replaced

By "version", do you mean "timestamp"? If yes, is there maybe a public snapshot you could share that does not have timestamp, that I could use during development and to add tests?

We should have a consistent story for handling snapshots throughout the ruleset

What do you mean here, that snapshots should be handled by all resolvers? If yes, I agree. I was planning to look into the other resolvers after merging this PR. Is that okey, or would you prefer to do it differently?

When we create the lockfile, omit the sha256 of any snapshot jar that doesn't have a fixed version (likely still contains SNAPSHOT.jar in the file name). That means that if the version is updated in place, builds won't break unexpectedly.

Let me expand a bit on my comment here. How would this work in bazel?

  • Bazel uses http_file rule in here to download the artifacts. That rule has a sha256 property that is optional, but the docs mention that It is a security risk to omit the SHA-256 as remote files can change. At best omitting this field will make your build non-hermetic - we would be accepting these risks when adding support for such kind of snapshots
  • For http_file rules without sha256, bazel would keep the downloaded artifacts in the cache, and will never try to update them. How do you envision that someone would trigger a snapshot update in this case? My understanding is that gradle caches snapshots but still fetches them once a day - which is not the case for bazel
  • My current understanding is that time-stamping snapshots is the default behaviour for maven repositories. I acknowledge that when searching in google "maven snapshots without timestamp" there are some ways to do it mentioned (example here and here), but I also observed that those mentions are usually +10 years old. Furthermore, when trying to find more recent resources about how to do this, I found this from 3 years ago that mentions that Support for uniqueVersion was deprecated in Maven 2.x, and removed entirely in Maven 3, which is confirmed by maven documentation in here.

Integrating snapshots without timestamps goes against bazel fundamentals + it is deprecated from maven 3. This is what makes me question wether this ruleset wants/needs to support it. Wdyt?

acecilia avatar Jul 11 '25 13:07 acecilia

@shs96c let me know 🙏

acecilia avatar Jul 18 '25 00:07 acecilia

I know this is a long reply, but please know that I'm generally supportive of the idea of allowing snapshots to be used, but I'm very wary about how that support should be added. I'm really happy to work with you to find a solution that works for everyone.

@acecilia, an example of where the SNAPSHOT suffix is kept and no date-specific version is generated is anything stored on the sonatype OSS snapshot library. Selenium is an example https://oss.sonatype.org/#view-repositories;snapshots~browsestorage~org/seleniumhq/selenium/selenium-java (you may need to use the "path lookup" to search for org/seleniumhq/selenium/selenium-java) Every time a new snapshot is released, the new version overwrites the old one, meaning that the sha256 can't ever be trusted.

My other concern is that snapshot servers seldom keep an extensive history of snapshots; they get removed fairly swiftly. In one company I worked for, the lifetime of a snapshot was at most 30 days. Using snapshots necessarily means that we've given up on historical builds of our project. That's not an objection to providing support for them, but it's something we should call out in the docs.

We should have a consistent story for handling snapshots throughout the ruleset

It would be nice if the resolvers all supported snapshots, but the main thing is that we should be able to handle both kinds of snapshots (the ephemeral dated ones and the ones that are being replaced) with similar logic. The only safe way to do that is to set the checksum to None.

Bazel uses http_file rule in here to download the artifacts. That rule has a sha256 property that is optional, but the docs mention that It is a security risk to omit the SHA-256 as remote files can change. At best omitting this field will make your build non-hermetic - we would be accepting these risks when adding support for such kind of snapshots

Yes. This is true.

Having said that, there are notable rulesets in the bazel world that already omit the shasum check by necessity (for example, when rules_go fetches the list of Go SDKs) Apparently it's something the community is happy with.

For http_file rules without sha256, bazel would keep the downloaded artifacts in the cache, and will never try to update them. How do you envision that someone would trigger a snapshot update in this case? My understanding is that gradle caches snapshots but still fetches them once a day - which is not the case for bazel

I believe that Bazel will refetch the resource every time the daemon is started. It should be sufficient to do a bazel shutdown to force a reload of the artifact.

My current understanding is that time-stamping snapshots is the default behaviour for maven repositories.

Sadly this isn't the case in the wild. Many OSS projects have ephemeral snapshots that are replaced with each new version, and from experience I know that there are corporate servers that work the same way.

Integrating snapshots without timestamps goes against bazel fundamentals + it is deprecated from maven 3. This is what makes me question wether this ruleset wants/needs to support it. Wdyt?

Given that we recently had a request to support java 8, I don't think everyone will migrate to maven 3 in a hurry. If we're going to support snapshots, we need to support all of them.

So far, I've never implemented snapshot support for these reasons:

  1. "Replaced" snapshots (without a date identifier) cannot be supported with a shasum with bazel as-is.
    1. As you point out, this breaks the guarantees that Bazel makes for you
    2. This also means that you can be sure you can repeat a build. Even the same commit of a repo could end up with a different dependency
  2. Versioned snapshots (with a date identifier) are frequently removed from snapshot repos.
    1. Historical builds become far harder to do
  3. Modifying each of our resolvers to properly mark snapshots is a challenge
    1. We may be able to apply heuristics to make this a problem we solve in starlark when locking

The inability to rely on a build is a real problem for me, but I acknowledge that people who deliberately use snapshots accept this risk, so that's not a blocking issue for me.

Of these reasons, I think we have a path forward for the first ("just" don't set the checksum), the second is a quality of life thing users of snapshots will have to accept, and you're making a start on the third. I think we can figure this out :)

shs96c avatar Jul 25 '25 09:07 shs96c

Got it thanks for that detailed answer, I was not aware non-version snapshots were still so widely used 🙏

Updated the PR and added support for non-versioned snapshots in gradle resolver

acecilia avatar Aug 08 '25 22:08 acecilia

For some reason the PR is failing when validating with bazel 6.4.0. I spent some time but I dont manage to understand why, any help is much appreciated

acecilia avatar Aug 09 '25 17:08 acecilia

@smocherla-brex would appreciate your feedback here 🙏 Thanks!

acecilia avatar Aug 16 '25 19:08 acecilia

I am keen to complete this PR with your guidance, have a look when you have a chance please 🙏

acecilia avatar Sep 05 '25 18:09 acecilia

Apologies for the long delay getting back to you. I've been traveling and sick. I want to review the version_revision as it's making me uneasy, and I'd like to be satisfied it's required.

shs96c avatar Oct 06 '25 15:10 shs96c