Description

This PR implements Evaluation EVE, a system that automatically evaluates EVE-OS across multiple partitions (IMGA/IMGB/IMGC) on hardware under test. It provides core infrastructure for sequential partition testing, hardware inventory collection, and onboarding control.

Purpose

Automatically select the best kernel/firmware combination based on:

Primary criterion: Partition boots successfully
Secondary criterion: Hardware inventory completeness (devices detected)
Tiebreaker: Least advanced partition (IMGA < IMGB < IMGC)

What This PR Includes

Automatic Partition Testing

Sequential testing of all partitions (IMGA → IMGB → IMGC)
Configurable stability validation (default: 5 minutes per slot)
Automatic detection and skipping of failed boots
Partition state reconciliation after crashes/watchdog reboots

Hardware Inventory Collection

Collects hardware data per partition: PCI devices, USB devices, kernel parameters, IOMMU groups
Persisted to /persist/eval/<partition>-YYYY-MM-DD-HH:MM/
Automatic cleanup (30-day retention)
Status tracking via PubSub

Onboarding Control

Blocks device onboarding during evaluation
Only final partition (after all tested) connects to controller
PubSub-based coordination between evalmgr and client agents
Real-time progress updates

Robust State Management

Persistent state survives reboots (/persist/eval/state.json)
Per-partition metadata (boot count, last boot time, failures)
Integration with zboot partition states
Scheduler state machine: Idle → StabilityWait → Scheduled → Finalized

Architecture

New evalmgr Agent (pkg/pillar/cmd/evalmgr/, ~2,100 lines)

Platform detection (/etc/eve-platform contains "evaluation")
Partition state reconciliation on boot
Stability validation with configurable timers
Hardware inventory collection and status tracking
Automatic scheduling of next partition
Status publishing via PubSub

Integration Points

client agent: Gates onboarding until evaluation completes
diag tool: Displays evaluation status and inventory collection progress
zboot: IMGC partition support for evaluation platforms
device-steps.sh: Starts evalmgr before client agent
mkimage-raw-efi: Initializes evaluation partitions with correct priorities

Testing

Comprehensive test suite (1,670 lines):

Multi-boot evaluation flow simulation
Failure recovery scenarios
Inventory collection verification with event tracking
GRUB boot selection validation
All 13 tests passing

Commit Structure

Types & Interfaces - Core data structures (298 lines)
GPT Access Layer - Partition management abstraction
System Reset - Reboot handling component
Persistent State - State management across reboots
Evaluation Agent - Main orchestration logic with platform detection
Test Infrastructure - Complete test suite with inventory event verification
Diagnostic Display - Status visibility
Partition Initialization - EFI partition setup
Hardware Inventory - Collection and persistence
Inventory Status - PubSub integration

PR dependencies

https://github.com/lf-edge/eve/pull/5348 - MERGED

How to test and validate this PR

build evaluation installer make PLATFORM=evaluation installer-raw
install eve, observe diag output reports status
check that after evalmgr is done /persis/eval has status.json that has status for all IMB[A,B,C] partitions

Changelog notes

Automatic Partition Testing and Onboarding Control for Evaluation EVE

PR Backports

- 14.5-stable: No, as the feature is not available there.
- 13.4-stable: No, as the feature is not available there.

Also, to the PRs that should be backported into any stable branch, please add a label stable.

Checklist

[x] I've provided a proper description
[ ] I've added the proper documentation
[x] I've tested my PR on amd64 device
[ ] I've tested my PR on arm64 device
[x] I've written the test verification instructions
[x] I've set the proper labels to this PR

And the last but not least:

[x] I've checked the boxes above, or I've provided a good reason why I didn't check them.

Please, check the boxes above after submitting the PR in interactive mode.

Nov 05 '25 14:11 rucoder

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 20.39%. Comparing base (2281599) to head (81c7bfa). :warning: Report is 61 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5351      +/-   ##
==========================================
+ Coverage   19.52%   20.39%   +0.86%     
==========================================
  Files          19       19              
  Lines        3021     2314     -707     
==========================================
- Hits          590      472     -118     
+ Misses       2310     1721     -589     
  Partials      121      121

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Nov 07 '25 10:11 codecov[bot]

Configurable stability validation (default: 5 minutes per slot)

The default watchdog timer for the touch files is 300 seconds, but since it will take a while for a watchdog to trigger we actually wait for twice that before we declare a EVE update to be successful. So the safe thing would be to wait longer here as well, unless we can quantify the time it takes to actually watchdog.

FWIW you can manually run /opt/zededa/bin/faultinjection -H to cause a touch file watchdog.

Nov 08 '25 02:11 eriknordmark

Evaluation EVE: Automatic Partition Testing and Onboarding Control

Description

Purpose

What This PR Includes

Automatic Partition Testing

Hardware Inventory Collection

Onboarding Control

Robust State Management

Architecture

Testing

Commit Structure

PR dependencies

How to test and validate this PR

Changelog notes

PR Backports

Checklist

Codecov Report