libyear-gradle-plugin icon indicating copy to clipboard operation
libyear-gradle-plugin copied to clipboard

Ability to filter out pre-release dependency versions

Open grimsa opened this issue 1 year ago • 2 comments
trafficstars

I recently learned about the libyear metric and this plugin, and ran an analysis on one of our projects.

Problem

One issue I noticed in the output was that some dependencies are reported as outdated, even when no stable version existed.

Example line from the report:

 -> 1.7 years  from jakarta.persistence:jakarta.persistence-api (3.1.0 => 3.2.0-M1)

However, currently the released versions look like this:

VERSION NUMBER DATE PUBLISHED
3.2.0-M1 2023-11-23
3.2.0-B02 2023-11-06
3.2.0-B01 2023-08-28
3.1.0 2022-02-25
... ...

Given that using unstable/non-final dependency versions in production is considered to be bad practice, I think this plugin could either automatically exclude non-final versions, or at least allow the user to somehow configure which newer versions to consider.

Impact

For a project that had 79 outdated dependencies, 16 of them (i.e., ~20%) were compared against non-final versions:

 -> 1.7 years  from jakarta.persistence:jakarta.persistence-api (3.1.0 => 3.2.0-M1)
 -> 1.5 years  from jakarta.validation:jakarta.validation-api (3.0.2 => 3.1.0-M1)
 -> 1.4 years  from jakarta.annotation:jakarta.annotation-api (2.1.1 => 3.0.0-M1)
 -> 1.2 years  from net.sf.jopt-simple:jopt-simple (5.0.4 => 6.0-alpha-3)
 -> 10 months  from org.apache.logging.log4j:log4j-api (2.20.0 => 3.0.0-beta1)
 -> 10 months  from org.apache.logging.log4j:log4j-to-slf4j (2.20.0 => 3.0.0-beta1)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-common (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-reflect (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-jdk8 (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-jdk7 (1.8.22 => 2.0.0-Beta2)
 -> 3.8 months from org.slf4j:jul-to-slf4j (2.0.9 => 2.1.0-alpha0)
 -> 3.8 months from org.slf4j:slf4j-api (2.0.9 => 2.1.0-alpha0)
 -> 28 days    from org.apache.httpcomponents.client5:httpclient5 (5.2.3 => 5.4-alpha1)
 -> 25.9 days  from org.apache.httpcomponents.core5:httpcore5-h2 (5.2.4 => 5.3-alpha1)
 -> 25.9 days  from org.apache.httpcomponents.core5:httpcore5 (5.2.4 => 5.3-alpha1)

This results in either:

  • Falsely reported dependencies - e.g., for jakarta.persistence:jakarta.persistence-api version 3.1.0 that is used is actually the latest stable release
  • Incorrect libyear values - e.g. for org.apache.httpcomponents.client5:httpclient5 libyear value of 28 days was reported (5.2.3 => 5.4-alpha1), but if we compared against the latest stable version (5.2.3 => 5.3), then libyear value would be just 5 days

Collectively this:

  • Results in a higher libyear value than it actually is
  • Makes the analysis results more difficult to interpret, as they require additional post-processing by a person

Potential solutions

General solution

Looking at semver, it seems that any pre-release version would contain a hyphen:

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. . . . Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--.

And looking at the anecdotal evidence from this one project, it seems that:

  • all pre-release versions did indeed contain a hyphen
  • the only dependency, the version of which contained a hyphen, and which was a stable release was Guava (com.google.guava:guava (32.1.3-jre => 33.0.0-jre))

Therefore, maybe the general rule could be "if current dependency version contains a hyphen, then consider all available dependency versions, while if it does not - only look at versions without hyphens)

User-configurable solution

Maybe there could be a configuration parameter that allows the user to specify what versions to include or exclude:

libyear {
  configurations = ['compileClasspath']
  ignoreNewerArtifactsWithVersionsMatching = "<regex that matches specific suffixes>"
     ^-- new parameter
  failOnError = true
  validator = allArtifactsCombinedMustNotBeOlderThan(days(5))
}

Example of such regex could be -(?!jre) that would ignore anything with a hyphen, except if it was -jre

grimsa avatar Dec 31 '23 13:12 grimsa

Thank you very much @grimsa for your detailed report and your interest in this plugin!

From a surface-level reading, I think the plugin could do better for the general case of semver. If semver describes what a "pre-release" version number looks like, a configuration option to filter out pre-release versions looks reasonable, and may even default to "true".

But at the same time relying more on semver for artifact ordering may be a significant departure from the existing approach, in which the repository tells us which release is the most "recent" (aka "last published"). In many cases this strategy has been very reliable, and works also with projects which do not version with semver, while at the same time has other drawbacks, such as this one:

https://github.com/f4lco/libyear-gradle-plugin/blob/7849052ddbd5f6562fdc08b289e12bddf0d55936/libyear-gradle-plugin/src/main/kotlin/com/libyear/sourcing/SolrSearchAdapter.kt#L107-L111

We'll have to give it more thought, for implementation, as well as on the question "what is the best possible 'default' behavior for the plugin". Any input is appreciated :)

f4lco avatar Dec 31 '23 17:12 f4lco

About multiple dependency versions being maintained in parallel - I noticed that as well with Spring projects.

I did not consider it to be a problem in my case, because, for example, Spring Security maintains 3 versions in parallel (https://spring.io/projects/spring-security/#support), at the time of writing this it is 6.2.x, 6.1.x, and 5.8.x. As far as I can tell, they publish releases for all 3 versions within minutes of each other (starting with the oldest and finishing with the latest).

So if we were running the latest 5.8.x release, we would observe:

  • A tiny amount of libyears reported for this dependency (because 6.2.x release has been published a few minutes after 5.8.x)
  • No indication that we're multiple significant releases behind

But I think it is acceptable, because:

  1. While each release line is being maintained, as long as we're on the latest release of same major version (even if it is not the latest release line) - we're still using a maintained version, so maybe libyear showing close-to-zero is meaningful. Once maintenance of 5.8.x line stops, we'd naturally see increasing number of libyears accumulating, and then we'd have a clear signal to upgrade.
  2. The fact that other release lines exist would still be visible in the report as a minutes-large amount of libyears for this dependency (because if 5.8.x line was the latest one, it's release would be published last, and then it would result in 0 libyears, and no entry). So this is also good, though it depends on Spring policy of publishing newer releases later (even if by minutes), which seems to not be the case with Tomcat.

--

As for how to determine the version.

I did try sending a request to Solr search (GET https://search.maven.org/solrsearch/select?q=g:"org.apache.tomcat" AND a:"tomcat") and see how given a version it can return a timestamp.

As for determining what versions are published - maybe it would be possible to leverage published maven metadata? For example, for Tomcat: https://repo1.maven.org/maven2/org/apache/tomcat/tomcat/maven-metadata.xml

It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <groupId>org.apache.tomcat</groupId>
  <artifactId>tomcat</artifactId>
  <versioning>
    <latest>11.0.0-M15</latest>
    <release>11.0.0-M15</release>
    <versions>
      <version>7.0.35</version>
      // ...
      <version>7.0.109</version>
      <version>8.0.0-RC1</version>
      <version>8.0.0-RC3</version>
      <version>8.0.0-RC5</version>
      <version>8.0.0-RC10</version>
      <version>8.0.1</version>
      // ...
      <version>9.0.84</version>
      <version>10.0.0-M1</version>
      <version>10.0.0-M3</version>
      <version>10.0.0-M4</version>
      <version>10.0.0-M5</version>
      <version>10.0.0-M6</version>
      <version>10.0.0-M7</version>
      <version>10.0.0-M8</version>
      <version>10.0.0-M9</version>
      <version>10.0.0-M10</version>
      <version>10.0.0</version>
      // ...
      <version>10.1.17</version>
      <version>11.0.0-M1</version>
      <version>11.0.0-M3</version>
      <version>11.0.0-M4</version>
      <version>11.0.0-M5</version>
      <version>11.0.0-M6</version>
      <version>11.0.0-M7</version>
      <version>11.0.0-M9</version>
      <version>11.0.0-M10</version>
      <version>11.0.0-M11</version>
      <version>11.0.0-M12</version>
      <version>11.0.0-M13</version>
      <version>11.0.0-M14</version>
      <version>11.0.0-M15</version>
    </versions>
    <lastUpdated>20231212142015</lastUpdated>
  </versioning>
</metadata>

I also checked the metadata file for one of Spring Security artifacts and I see that releases are ordered by version (and not by release date).

And this is what metadata for Guava looks with its -android and -jre variants.

Maybe then the logic could be something like (pseudocode):

getMavenMetadata("org.apache.tomcat:tomcat").streamVersions()
   .dropWhile(version -> version is not equal to that of the dependency version in current project, e.g. "10.1.3")
   // v-- This filter step would deal with the logic requested in this issue
   .filter(version -> version is not a pre-release version as defined by semver or some other possibly customizable logic)
   .findLast() 

This would then result in 10.1.17 being returned, because all 11.0.0 versions are pre-release versions. And then Solr search could be used to lookup the release dates of 10.1.3 and 10.1.17 releases (to calculate libyear value).

So overall, it seems that combining use of Maven metadata with Solr search might make it possible to have a better solution for cases where multiple release lines are maintained in parallel (like Tomcat or Spring does), and it would also make it possible to exclude pre-release versions (because in Maven metadata we have access to all versions, not just the latest).

--

Now that I'm writing this, it also seems to me that maven metadata would also make it quite easy to implement version-distance-based metric calculation (I think the original paper argued that it has more benefits over date-based metric). That could be interesting and useful too.

grimsa avatar Jan 07 '24 20:01 grimsa