mamba icon indicating copy to clipboard operation
mamba copied to clipboard

fix: correct repodata_record.json metadata for explicit installs (#4095)

Open maresb opened this issue 1 month ago • 2 comments

⚠️ Important Disclosure

I do not know C++. This PR was extremely heavily LLM-assisted (Claude in Cursor). As a result, please review this PR skeptically and critically. I have spent significantly more time than typical attempting to verify correctness to the extent I'm able.

Description

This PR fixes #4095: incorrect repodata_record.json precedence for explicit installs.

Background

Since v2.1.1, libmamba switched to "repodata-first" when constructing info/repodata_record.json (intended to honor channel repodata patches over info/index.json). However, for explicit installs (URL-based), the "repodata" object is a URL-derived stub containing placeholder/zero values—not real channel repodata. Making that stub authoritative caused repodata_record.json to be written with incomplete/zeroed fields.

Affected versions:

  • v2.1.1–v2.3.2: Full corruption (all stub fields written)
  • v2.3.3–v2.4.0: Partial fix via #4071 (depends/constrains fixed, but license, timestamp, build_number, track_features still corrupted)

The v2.3.3 partial fix (#4071) removed empty depends/constrains arrays so they could be backfilled from index.json. However, this approach has two problems:

  1. Channel patches undone: Patches that intentionally set depends/constrains to [] are silently overwritten by index.json values
  2. Other fields still wrong: build_number, license, timestamp, and track_features remained corrupted

Solution

This PR leverages the existing defaulted_keys field (originally introduced in PR #1120 for signature verification, but no longer populated since the 2023 libsolv wrapper refactor) to distinguish URL-derived stub values from authoritative solver/channel values:

  • URL-derived packages (from_url()): defaulted_keys lists stub field names (e.g., ["_initialized", "license", "timestamp", ...]). These fields are erased before merging with index.json, allowing correct values to be filled in.

  • Solver-derived packages (make_package_info()): defaulted_keys = ["_initialized"] only. The sentinel signals "trust ALL fields"—nothing is erased, preserving channel patches including intentionally empty arrays.

The _initialized sentinel enables fail-hard verification: write_repodata_record() throws std::logic_error if it's missing, catching any code path that fails to properly initialize defaulted_keys.

Additionally, this PR adds cache healing to detect and fix corrupted caches from v2.1.1–v2.4.0 using a corruption signature (timestamp == 0 AND license == "").

Review Recommendation

Please review commit-by-commit. Each commit has an extended commit message with:

  • Detailed explanation of the changes
  • Test status (assertions passed/failed)
  • Rationale for design decisions

The commits follow TDD (Test-Driven Development) with alternating RED (failing tests) and GREEN (fix) commits:

Commit Type Description
01 RED Tests for defaulted_keys population in from_url()
02 GREEN Populate defaulted_keys in from_url() and make_package_info()
03 RED Tests for URL-derived metadata and channel patch preservation
04 GREEN Use defaulted_keys in write_repodata_record() and constructor.cpp
05 RED Test for cache healing
06 GREEN Detect corrupted cache in has_valid_extracted_dir()
07 RED Tests for consistent field presence and checksums
08 GREEN Ensure consistent field presence and compute missing checksums

Commits 07-08 add consistency improvements: ensuring depends/constrains are always present as arrays, omitting track_features when empty (matching conda behavior), and computing missing checksums from the tarball.

Type of Change

  • [x] Bugfix

Checklist

  • [x] My code follows the general style and conventions of the codebase, ensuring consistency
  • [x] I have performed a self-review of my code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] My changes generate no new warnings
  • [x] I have run pre-commit run --all locally in the source folder and confirmed that there are no linter errors.
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing tests pass locally with my changes

maresb avatar Nov 29 '25 21:11 maresb