fix: correct repodata_record.json metadata for explicit installs (#4095)

Open maresb opened this issue 1 month ago • 2 comments

⚠️ Important Disclosure

I do not know C++. This PR was extremely heavily LLM-assisted (Claude in Cursor). As a result, please review this PR skeptically and critically. I have spent significantly more time than typical attempting to verify correctness to the extent I'm able.

Description

This PR fixes #4095: incorrect repodata_record.json precedence for explicit installs.

Background

Since v2.1.1, libmamba switched to "repodata-first" when constructing info/repodata_record.json (intended to honor channel repodata patches over info/index.json). However, for explicit installs (URL-based), the "repodata" object is a URL-derived stub containing placeholder/zero values—not real channel repodata. Making that stub authoritative caused repodata_record.json to be written with incomplete/zeroed fields.

Affected versions:

v2.1.1–v2.3.2: Full corruption (all stub fields written)
v2.3.3–v2.4.0: Partial fix via #4071 (depends/constrains fixed, but license, timestamp, build_number, track_features still corrupted)

The v2.3.3 partial fix (#4071) removed empty depends/constrains arrays so they could be backfilled from index.json. However, this approach has two problems:

Channel patches undone: Patches that intentionally set depends/constrains to [] are silently overwritten by index.json values
Other fields still wrong: build_number, license, timestamp, and track_features remained corrupted

Solution

This PR leverages the existing defaulted_keys field (originally introduced in PR #1120 for signature verification, but no longer populated since the 2023 libsolv wrapper refactor) to distinguish URL-derived stub values from authoritative solver/channel values:

URL-derived packages (from_url()): defaulted_keys lists stub field names (e.g., ["_initialized", "license", "timestamp", ...]). These fields are erased before merging with index.json, allowing correct values to be filled in.
Solver-derived packages (make_package_info()): defaulted_keys = ["_initialized"] only. The sentinel signals "trust ALL fields"—nothing is erased, preserving channel patches including intentionally empty arrays.

The _initialized sentinel enables fail-hard verification: write_repodata_record() throws std::logic_error if it's missing, catching any code path that fails to properly initialize defaulted_keys.

Additionally, this PR adds cache healing to detect and fix corrupted caches from v2.1.1–v2.4.0 using a corruption signature (timestamp == 0 AND license == "").

Review Recommendation

Please review commit-by-commit. Each commit has an extended commit message with:

Detailed explanation of the changes
Test status (assertions passed/failed)
Rationale for design decisions

The commits follow TDD (Test-Driven Development) with alternating RED (failing tests) and GREEN (fix) commits:

Commit	Type	Description
01	RED	Tests for `defaulted_keys` population in `from_url()`
02	GREEN	Populate `defaulted_keys` in `from_url()` and `make_package_info()`
03	RED	Tests for URL-derived metadata and channel patch preservation
04	GREEN	Use `defaulted_keys` in `write_repodata_record()` and `constructor.cpp`
05	RED	Test for cache healing
06	GREEN	Detect corrupted cache in `has_valid_extracted_dir()`
07	RED	Tests for consistent field presence and checksums
08	GREEN	Ensure consistent field presence and compute missing checksums

Commits 07-08 add consistency improvements: ensuring depends/constrains are always present as arrays, omitting track_features when empty (matching conda behavior), and computing missing checksums from the tarball.

Type of Change

[x] Bugfix

Checklist

[x] My code follows the general style and conventions of the codebase, ensuring consistency
[x] I have performed a self-review of my code
[x] I have commented my code, particularly in hard-to-understand areas
[x] My changes generate no new warnings
[x] I have run pre-commit run --all locally in the source folder and confirmed that there are no linter errors.
[x] I have added tests that prove my fix is effective or that my feature works
[ ] New and existing tests pass locally with my changes

Nov 29 '25 21:11 maresb