dependency-analysis-gradle-plugin icon indicating copy to clipboard operation
dependency-analysis-gradle-plugin copied to clipboard

Cache misses between CI and local

Open NikolayMetchev opened this issue 8 months ago • 15 comments

Build scan link

Plugin version 1.28.0

Gradle version 8,5

JDK version 17

(Optional) Kotlin and Kotlin Gradle Plugin (KGP) version 1.9.21

(Optional) Android Gradle Plugin (AGP) version N/a

Describe the bug The buildHealth task and all its deps are not being cached properly and we get cache misses more often than ideal. The cause seems to be not using PathSenstivity.Relative in all cases. The reason we get cache misses is because the absolute paths of input dependencies of many of the tasks created by the plugin use absolute or none path sensitivity. See image for one such example

image

To Reproduce Steps to reproduce the behavior:

  1. ...

Expected behavior If the code hasn't change I expect cache hits for all the tasks that this plugin creates in CI.

Additional context

NikolayMetchev avatar Dec 22 '23 09:12 NikolayMetchev

Seems like this is a conscious decision based on this code comment: https://github.com/autonomousapps/dependency-analysis-gradle-plugin/blob/08c8765157925bbcdfd8f63d8d37fe041561ddb4/src/main/kotlin/com/autonomousapps/tasks/ArtifactsReportTask.kt#L45-L52

TimvdLippe avatar Jan 01 '24 14:01 TimvdLippe

I think if that is changed to relative the comment will still stand with the added benefit of allowing cache reuse across CI runs.

NikolayMetchev avatar Jan 01 '24 15:01 NikolayMetchev

I haven't investigated this area in a while. It is my understanding that RELATIVE is only relative to the project directory. Oftentimes these jars are in the Gradle User Home (that is, outside of the project structure entirely), and because of this I use ABSOLUTE because I need the actual path to the jars for some of the processing.

There may be some way to not use ABSOLUTE path but somehow transform paths and have it all still work, but I don't quite know how I'd do that and don't really have the time at the moment. Contributions welcome!

autonomousapps avatar Jan 08 '24 23:01 autonomousapps

The @Classpath annotation or the @PathSensitive(PathSensitivity.RELATIVE) would only have some influence on the way the task cache key is computed. The underlying file collection would still contain all the data with full paths.

Could you @autonomousapps explain in which case the @Classpath annotation would not work?

jprinet avatar Jan 15 '24 16:01 jprinet

I created a PR. Let me know if it works? https://github.com/autonomousapps/dependency-analysis-gradle-plugin/pull/1107

NikolayMetchev avatar Jan 22 '24 11:01 NikolayMetchev

I don't understand the problem with NONE. It was my understanding that this instructs Gradle that the path to the file doesn't matter, and that the only thing we care about is the contents of the file(s). Is that not right?

We can't use @Classpath in some cases because often the list of files just "happens" to be jars, they're not semantically a classpath. We need to treat those files as just files.

There were cases in the past (maybe it was a Gradle bug from the Gradle 5/6 era?) where I used RELATIVE for some of the inputs, and then the outputs, which in some cases contain absolute paths to files in the gradle user home directory, were simply wrong when those outputs were pulled from the build cache on a different machine from which the outputs were originally created.

Because of that last point, it is challenging to write a regression test to validate that changes to path sensitivity work as required. As such, I am extremely uncomfortable with merging PRs that make such changes. If you were to test out your change, write to the build cache from say a mac and pull it from say a linux (or Windows!) machine, and tell me it all still works, then I'd find that very interesting.

Maybe a path forward is to identify areas where the plugin is writing out absolute file paths, see if they're still necessary, and if not remove or change them, such that correctness is no longer dependent on those absolute paths.

autonomousapps avatar Jan 22 '24 18:01 autonomousapps

I don't understand the problem with NONE. It was my understanding that this instructs Gradle that the path to the file doesn't matter, and that the only thing we care about is the contents of the file(s). Is that not right?

That's correct, the problem here is only for inputs using PathSensitivity.ABSOLUTE

We can't use @Classpath in some cases because often the list of files just "happens" to be jars, they're not semantically a classpath. We need to treat those files as just files.

It makes sense, although the input name classpathArtifactFiles is misleading. I now understand that the output files created may contain absolute paths to Gradle user home and that restoring them only makes sense when Gradle user home is the same.

There were cases in the past (maybe it was a Gradle bug from the Gradle 5/6 era?) where I used RELATIVE for some of the inputs, and then the outputs, which in some cases contain absolute paths to files in the gradle user home directory, were simply wrong when those outputs were pulled from the build cache on a different machine from which the outputs were originally created.

It does not sound like a Gradle bug but rather an expected behavior, using RELATIVE will make the cache key identical on machines having different Gradle user home.

Maybe a path forward is to identify areas where the plugin is writing out absolute file paths, see if they're still necessary, and if not remove or change them, such that correctness is no longer dependent on those absolute paths.

👍

jprinet avatar Jan 23 '24 10:01 jprinet

I am a little confused. I feel like the only thing that should be writing to the gradle home directory are core tasks and not any of the tasks that this plugin defines. Why can't the task this plugin defines write in the build directory of the project it is evaluating?

NikolayMetchev avatar Jan 23 '24 17:01 NikolayMetchev

The change in my PR only affects inputs and not outputs anyway.

NikolayMetchev avatar Jan 23 '24 17:01 NikolayMetchev

I am a little confused. I feel like the only thing that should be writing to the gradle home directory are core tasks and not any of the tasks that this plugin defines. Why can't the task this plugin defines write in the build directory of the project it is evaluating?

as far as I understand, the report output is located in the build directory but can contain absolute paths to files outside of the project directory (Gradle user home). If using PathSensitivity.RELATIVE we would get cache hits across machines and those absolute paths may become wrong, hence the reason for using PathSensitivity.ABSOLUTE

jprinet avatar Jan 24 '24 08:01 jprinet

The output report doesn't refer to jars. it only references dependencies. Here is a sample:

Advice for :test:aws:cedar
Unused dependencies which should be removed:
  api(libs.aws.lambda.java.core)
  api(libs.aws.lambda.java.events)
  api(libs.aws.sdk.kotlin.lambda)
  implementation(libs.jsonassert)
  implementation(libs.kotlinx.serialization.core)
  implementation(libs.kotlinx.serialization.json)
  implementation(libs.ktor.http)
  implementation(libs.mockk.dsl)
  implementation(libs.truth)
  implementation(project(":utils:aws:lambda:client"))
  implementation(project(":utils:aws:lambda:http:core"))
  testImplementation(libs.mockk.dsl)
  testImplementation(libs.truth)
  testImplementation(project(":test:utils"))

NikolayMetchev avatar Jan 24 '24 09:01 NikolayMetchev

According to @autonomousapps, there are some cases where the outputs contain paths to Gradle user home.

jprinet avatar Jan 24 '24 10:01 jprinet

It seems there are some intermediate outputs in json that contain absolute paths. I guess we need to change those to be relative paths and then we should be good to go. Because the cache hit will only work if the gradle home directory is relatively the same between two runs.

NikolayMetchev avatar Jan 24 '24 11:01 NikolayMetchev

It seems there are some intermediate outputs in json that contain absolute paths. I guess we need to change those to be relative paths and then we should be good to go. Because the cache hit will only work if the gradle home directory is relatively the same between two runs.

Yes this. Some of the intermediate outputs contain references to absolute paths that are in the gradle user home directory. These were needed in the past but it's possible they no longer are. If those properties aren't used, I'm ok with eliminating them or transforming them or something. Only when we don't have references to absolute paths in gradle user home can we change input normalization to RELATIVE.

autonomousapps avatar Jan 25 '24 20:01 autonomousapps

It makes sense, although the input name classpathArtifactFiles is misleading.

This is a fair critique!

autonomousapps avatar Jan 25 '24 20:01 autonomousapps