Added non-conflicting hash for install files
Sumary
This commit introduces lock file version 3 with per-artifact hashing instead of a single global hash.
This per-artifact hashing approach can reduce the amount of merge conflicts when multiple people update canonical version in large monorepo.
The code still supports reading v2 lock files - it checks for v3 first, then falls back to v2, then v1. Users with older lock files will see a message to repin.
Key Changes
- Lock File Format Change (v2 → v3)
- Before (v2): __INPUT_ARTIFACTS_HASH and __RESOLVED_ARTIFACTS_HASH were single integer values
- After (v3): Both are now dictionaries mapping each artifact coordinate to its individual hash
Example in maven_install.json:
// Old format
"__INPUT_ARTIFACTS_HASH": 1994476565,
"__RESOLVED_ARTIFACTS_HASH": -274973469,
// New format
"__INPUT_ARTIFACTS_HASH": { "com.google.guava:guava": 733518530, "junit:junit": -652553691, "..." },
"__RESOLVED_ARTIFACTS_HASH": { "com.google.guava:guava": -1587873388, "..." }
- Hash Computation Changes (private/rules/v3_lock_file.bzl:53-108)
The new _compute_lock_file_hash_v3 function computes individual hashes per artifact that include:
- The artifact's own info (coordinates, SHA sums)
- The repository it came from
- Hashes of all transitive dependencies (dependency-aware hashing)
- Input Hash Changes (private/rules/coursier.bzl:334-386)
compute_dependency_inputs_signature now returns a dictionary of per-artifact hashes plus backward-compatible v1/v2 signatures.
This is looking really good. I like the idea of only having conflicts if the transitive deps have changed.
@MarconZet, I'm waiting until you move this out of draft before reviewing. Please LMK when you're ready!
@shs96c any progress on the review?
Could we add a description to the PR like:
Summary
This commit introduces lock file version 3 with per-artifact hashing instead of a single global hash. The main purpose is to create "non-conflicting" hashes that allow more granular change detection in the maven dependency lock files.
Key Changes
- Lock File Format Change (v2 → v3)
- Before (v2): __INPUT_ARTIFACTS_HASH and __RESOLVED_ARTIFACTS_HASH were single integer values
- After (v3): Both are now dictionaries mapping each artifact coordinate to its individual hash
Example in maven_install.json: // Old format "__INPUT_ARTIFACTS_HASH": 1994476565, "__RESOLVED_ARTIFACTS_HASH": -274973469,
// New format "__INPUT_ARTIFACTS_HASH": { "com.google.guava:guava": 733518530, "junit:junit": -652553691, ... }, "__RESOLVED_ARTIFACTS_HASH": { "com.google.guava:guava": -1587873388, ... }
- File Renames
- v2_lock_file.bzl → v3_lock_file.bzl
- V2LockFile.java → V3LockFile.java
- V2LockFileTest.java → V3LockFileTest.java
- Hash Computation Changes (private/rules/v3_lock_file.bzl:53-108)
The new _compute_lock_file_hash_v3 function computes individual hashes per artifact that include:
- The artifact's own info (coordinates, SHA sums)
- The repository it came from
- Hashes of all transitive dependencies (dependency-aware hashing)
- Input Hash Changes (private/rules/coursier.bzl:334-386)
compute_dependency_inputs_signature now returns a dictionary of per-artifact hashes plus backward-compatible v1/v2 signatures.
- Command-line Interface Change (pin_dependencies.bzl)
Changed from --input_hash (single value) to --input-hash-path (path to JSON file containing the hash dictionary).
- Backward Compatibility
The code still supports reading v2 lock files - it checks for v3 first, then falls back to v2, then v1. Users with older lock files will see a message to repin.
Purpose
This per-artifact hashing approach allows the system to detect exactly which artifacts changed, rather than just knowing "something changed." This is useful for incremental updates and more precise cache invalidation.
We tried this patch and so far it has been working well. There is one thing though. In case of mismatched signature, https://github.com/bazel-contrib/rules_jvm_external/blob/e95b9d7d2e70b32c11dc363f48d04bf3d619e5be/private/rules/coursier.bzl#L568 prints out a huge single line of artifact shas, for each every artifact. In our case it causes ~2GB of logs.