eve icon indicating copy to clipboard operation
eve copied to clipboard

Add Evaluation Platform Support with Multi-RootFS Build System

Open rucoder opened this issue 6 months ago • 4 comments

Description

NOTE: please read detailed description of changes in commit descriptions NOTE 2: this PR do not switch between partitions automatically and booting into IMGC will trigger watchdog after some time because IMGC is unknown partition label. Follow up PR will introduce necessary functionality

This PR introduces support for building EVE images for evaluation platforms with a revamped multi-rootfs architecture. Key changes include:

Core Features:

  1. Multi-rootfs image support

    • Extended build triplet to <HV>-<PLATFORM>-<FLAVOR>
    • Added pattern rules for rootfs images (rootfs-<platform>-<flavor>.img)
    • Evaluation images skip size limits (non-updatable)
  2. New partition layout

    • Added IMGC partition support in make-raw
    • Optimized partition sizing for evaluation platforms
    • Unified partition handling across Docker/installer environments
  3. Platform identification

    • Introduced /etc/eve-platform file for build-time platform detection
    • Simplified script logic by eliminating runtime platform passing

Build System Improvements:

  • Makefile enhancements:
    • Dynamic rootfs selection by platform
    • Automatic rootfs renaming (rootfs-b, rootfs-c, rootfs)
    • Removed obsolete build-tools target (reduces rebuilds)
  • Kernel/FW handling:
    • Fixed platform detection in kernel-versions.mk (filter → findstring)
    • Added evaluation-specific FW package logic (build-evaluation-generic.yml)
    • Per-flavor kernel/FW modifiers
  • Dockerfile cleanup:
    • Fixed deprecated syntax (ENV/ARG formatting)

Script Updates:

  • make-raw:
    • Supports IMGC partition creation
    • Uses /etc/eve-platform when platform isn't explicitly passed
    • Shell syntax fixes and dead code removal
  • runme.sh:
    • Simplified platform checking using eve_platform file
    • Exports PLATFORM for make-raw compatibility
  • prepare-platform.sh:
    • No longer fails on unknown platforms
  • parse-pkgs.sh:
    • Added special handling for FW packages by flavor

Commits:

Commit Changes
eee1b57 Core multi-rootfs support, Makefile patterns, platform file
30abc58 Evaluation modifiers, installer image bundling
440b597 IMGC partition support, make-raw enhancements
73d7cb7 make-raw shell syntax fixes
0dc6ddd Introduced /etc/eve-platform
a7d773c Removed build-tools target
fd2645f Dockerfile syntax fixes

How to test and validate this PR

  1. build a regular image make installer_raw, make sure it still works and the platform is set to generic
  2. build an eve container make eve and make sure all images generated from this container work as expected
  3. build make PLATFORM=evaluation installer_raw then make run-installer raw && make run-target make sure you hae all 3 IMG[A-C] partitins and you can switch between them using zboot
  4. build all sane combinations of HV= and PLATFORM= and make sure they work as expected

Changelog notes

Add Evaluation Platform Support with Multi-RootFS Build System

Checklist

  • [x] I've provided a proper description
  • [ ] I've added the proper documentation (when applicable)
  • [ ] I've tested my PR on amd64 device(s)
  • [ ] I've tested my PR on arm64 device(s)
  • [x] I've written the test verification instructions
  • [x] I've set the proper labels to this PR

rucoder avatar Jun 11 '25 17:06 rucoder

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 18.56%. Comparing base (fd35355) to head (47d505a). Report is 26 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4960      +/-   ##
==========================================
- Coverage   24.97%   18.56%   -6.42%     
==========================================
  Files           8       18      +10     
  Lines        1185     2613    +1428     
==========================================
+ Hits          296      485     +189     
- Misses        820     2046    +1226     
- Partials       69       82      +13     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 23 '25 17:06 codecov[bot]

The commit messages and PR description do a great job clearly explaining how the changes are implemented and what exactly was done. However, the high-level purpose behind the introduction of the IMGC partition is still unclear to me. Could you clarify a few points to help with this PR review, as well as future reviews?

Specifically:

  • Why exactly did we choose to have exactly three rootfs partitions? Why not two or four?
  • What precisely are we intending to put on these partitions (IMGA, IMGB, IMGC)? Is there a clear definition or policy for what goes into each partition?
  • Are these images intended to be switched manually by the user, or is there some planned automated logic for switching between partitions?
  • Could you provide an example of a concrete use-case scenario that illustrates how we expect these partitions to be used by end-users or developers?

I suggest documenting this high-level goal clearly now, as it will significantly simplify reviewing this and subsequent PRs.

Currently, my best guess is that IMGC is intended to provide an additional evaluation or experimental rootfs image, such as an alternative kernel or OS flavour, to allow users to test and compare them easily on a single device. But which exactly flavour and as an alternative, to what? And how to switch?

OhmSpectator avatar Jun 25 '25 10:06 OhmSpectator

The commit messages and PR description do a great job clearly explaining how the changes are implemented and what exactly was done. However, the high-level purpose behind the introduction of the IMGC partition is still unclear to me. Could you clarify a few points to help with this PR review, as well as future reviews?

Specifically:

* Why exactly did we choose to have exactly three rootfs partitions? Why not two or four?

design decision.

  • IMGA- standard EVE
  • IMGB - standard EVE Kerenl version + Ubuntu based config + full FW
  • IMBC - LTS kernel + ubuntu based config + full FW

basically instead of asking user to install Ubuntu and identify missing driver the system does it itself

* What precisely are we intending to put on these partitions (IMGA, IMGB, IMGC)? Is there a clear definition or policy for what goes into each partition?

see above

* Are these images intended to be switched manually by the user, or is there some planned automated logic for switching between partitions?
  • auto switch to collect device model that includs as many devices as possible. see NOTE 2. This is a first phase of this feature. Later automatic model publishing will be implemented so you can onboard the device without providing a model. For now this is just an improvement of our build system so we can support >1 rootfs image
* Could you provide an example of a concrete use-case scenario that illustrates how we expect these partitions to be used by end-users or developers?
  • install evaluation eve, switching stops a best partition, users do not need to worry about some missing devices drivers anymore.

I suggest documenting this high-level goal clearly now, as it will significantly simplify reviewing this and subsequent PRs.

Currently, my best guess is that IMGC is intended to provide an additional evaluation or experimental rootfs image, such as an alternative kernel or OS flavour, to allow users to test and compare them easily on a single device. But which exactly flavour and as an alternative, to what? And how to switch?

rucoder avatar Jun 26 '25 06:06 rucoder

The commit messages and PR description do a great job clearly explaining how the changes are implemented and what exactly was done. However, the high-level purpose behind the introduction of the IMGC partition is still unclear to me. Could you clarify a few points to help with this PR review, as well as future reviews? Specifically:

* Why exactly did we choose to have exactly three rootfs partitions? Why not two or four?

design decision.

  • IMGA- standard EVE
  • IMGB - standard EVE Kerenl version + Ubuntu based config + full FW
  • IMBC - LTS kernel + ubuntu based config + full FW

basically instead of asking user to install Ubuntu and identify missing driver the system does it itself

* What precisely are we intending to put on these partitions (IMGA, IMGB, IMGC)? Is there a clear definition or policy for what goes into each partition?

see above

* Are these images intended to be switched manually by the user, or is there some planned automated logic for switching between partitions?
  • auto switch to collect device model that includs as many devices as possible. see NOTE 2. This is a first phase of this feature. Later automatic model publishing will be implemented so you can onboard the device without providing a model. For now this is just an improvement of our build system so we can support >1 rootfs image
* Could you provide an example of a concrete use-case scenario that illustrates how we expect these partitions to be used by end-users or developers?
  • install evaluation eve, switching stops a best partition, users do not need to worry about some missing devices drivers anymore.

I suggest documenting this high-level goal clearly now, as it will significantly simplify reviewing this and subsequent PRs. Currently, my best guess is that IMGC is intended to provide an additional evaluation or experimental rootfs image, such as an alternative kernel or OS flavour, to allow users to test and compare them easily on a single device. But which exactly flavour and as an alternative, to what? And how to switch?

The commit messages and PR description do a great job clearly explaining how the changes are implemented and what exactly was done. However, the high-level purpose behind the introduction of the IMGC partition is still unclear to me. Could you clarify a few points to help with this PR review, as well as future reviews? Specifically:

* Why exactly did we choose to have exactly three rootfs partitions? Why not two or four?

design decision.

  • IMGA- standard EVE
  • IMGB - standard EVE Kerenl version + Ubuntu based config + full FW
  • IMBC - LTS kernel + ubuntu based config + full FW

basically instead of asking user to install Ubuntu and identify missing driver the system does it itself

* What precisely are we intending to put on these partitions (IMGA, IMGB, IMGC)? Is there a clear definition or policy for what goes into each partition?

see above

* Are these images intended to be switched manually by the user, or is there some planned automated logic for switching between partitions?
  • auto switch to collect device model that includs as many devices as possible. see NOTE 2. This is a first phase of this feature. Later automatic model publishing will be implemented so you can onboard the device without providing a model. For now this is just an improvement of our build system so we can support >1 rootfs image
* Could you provide an example of a concrete use-case scenario that illustrates how we expect these partitions to be used by end-users or developers?
  • install evaluation eve, switching stops a best partition, users do not need to worry about some missing devices drivers anymore.

I suggest documenting this high-level goal clearly now, as it will significantly simplify reviewing this and subsequent PRs. Currently, my best guess is that IMGC is intended to provide an additional evaluation or experimental rootfs image, such as an alternative kernel or OS flavour, to allow users to test and compare them easily on a single device. But which exactly flavour and as an alternative, to what? And how to switch?

Thanks! Now it's kind of clear.

OhmSpectator avatar Jun 26 '25 09:06 OhmSpectator

There is a yetus issue yetus: kernel-version.mk#L17codespell: varialbe ==> variable --

yetus: kernel-version.mk#L17 codespell: varialbe ==> variable and a commitlint issue to fix.

Plus some documentation would be helpful as @OhmSpectator has pointed out.

Kicking off tests.

@eriknordmark yes, this is not the final version. We are still discussing how kernel tags should be named. I'll submit final version ASAP

rucoder avatar Jun 30 '25 21:06 rucoder

The EVE update tests are red. 20+ attempts. All others are green. PLease, take a look. Artifacts from one of the failures: https://github.com/lf-edge/eve/actions/runs/15979353967/artifacts/3441397501

OhmSpectator avatar Jul 01 '25 17:07 OhmSpectator

The EVE update tests are red. 20+ attempts. All others are green. PLease, take a look. Artifacts from one of the failures: https://github.com/lf-edge/eve/actions/runs/15979353967/artifacts/3441397501

It looks like the artefact does not contain collect-info (((( I will try to rerun the test once more... Maybe the artefact is "cut" when we cancel the test...

OhmSpectator avatar Jul 01 '25 18:07 OhmSpectator

Attaching a debug archive: eden-report-eve-upgrade.tests.txt-tpm-true-ext4.zip

OhmSpectator avatar Jul 01 '25 19:07 OhmSpectator

3. Platform identification

* Introduced `/etc/eve-platform` file for build-time platform detection
* Simplified script logic by eliminating runtime platform passing

I'm confused. I see /etc/eve-platform being created and used by the runtime (the installer). But /etc/eve-platform isn't and can't be used at build time - you appear to have a eve-platform argument for that.

eriknordmark avatar Jul 02 '25 23:07 eriknordmark

Rebase on top of master

OhmSpectator avatar Jul 08 '25 15:07 OhmSpectator

  1. Platform identification
* Introduced `/etc/eve-platform` file for build-time platform detection
* Simplified script logic by eliminating runtime platform passing

I'm confused. I see /etc/eve-platform being created and used by the runtime (the installer). But /etc/eve-platform isn't and can't be used at build time - you appear to have a eve-platform argument for that.

@eriknordmark ah, true. run-time of course. I have a correct description in the commit but not in PR. fixed

rucoder avatar Jul 13 '25 10:07 rucoder

@uncleDecart @OhmSpectator @eriknordmark I looked at failed tests and it seems to me this is a communication issues between eclient and eden or eve to adma. I do not quite understand it gut I see that the status that the test (eve-upgrade-ext4) expects

test eden.lim.test -test.v -timewait 30m -test.run TestInfo -out InfoContent.dinfo.SwList[0].ShortVersion 'InfoContent.dinfo.SwList[0].PartitionState:inprogress'

is being sent back however there was error. this is not the only place but it would be nice if someoen else can look at it

2025-07-13 13:27:35.005 {"severity":"err","source":"dhcpcd","iid":"2839","content":"control_free: No such file or directory","msgid":3013,"timestamp":{"seconds":1752413255,"nanos":5473537}}
2025-07-13 13:27:35.048 /pillar/cmd/zedagent/reportinfo.go:1277: ctx.subBaseOsMgrStatus.Get failed: Get(baseosmgr/BaseOSMgrStatus) unknown key global
2025-07-13 13:27:35.055 /pillar/cmd/zedagent/reportinfo.go:760: Device swInfo: activated:true partitionLabel:"IMGB" partitionDevice:"/dev/sda3" partitionState:"inprogress" status:INSTALLED shortVersion:"12.1.0-kvm-amd64" downloadProgress:100
2025-07-13 13:27:35.055 /pillar/cmd/zedagent/reportinfo.go:760: Device swInfo: partitionLabel:"IMGA" partitionDevice:"/dev/sda2" partitionState:"active" status:INSTALLED shortVersion:"0.0.0-pr4960-c73f4c77-kvm-amd64" downloadProgress:100
2025-07-13 13:27:35.095 /pillar/netmonitor/linux.go:310: GetDhcpInfo(eth0) subnet_cidr 24
2025-07-13 13:27:35.095 /pillar/netmonitor/linux.go:300: GetDhcpInfo(eth0) network_number 192.168.0.0
2025-07-13 13:27:35.147 /pillar/netmonitor/linux.go:276: Calling dhcpcd -U -4 eth1
2025-07-13 13:27:35.213 /pillar/zedcloud/send.go:1117: SendOnIntf to https://mydomain.adam:3333/api/v2/edgedevice/id/57b432bd-5e1b-49b2-9fee-0c7dac6836a6/config reqlen 104 statuscode 403 Forbidden body:
00000000  46 6f 72 62 69 64 64 65  6e                       |Forbidden|
2025-07-13 13:27:35.213 /pillar/zedcloud/send.go:1118: Got payload for status Forbidden: Forbidden
2025-07-13 13:27:35.213 /pillar/cmd/zedagent/handleconfig.go:642: getLatestConfig : Device integrity token mismatch
2025-07-13 13:27:35.214 /pillar/cmd/zedagent/handleconfig.go:952: readSavedProtoMessageConfig stat /persist/checkpoint/lastconfig: no such file or directory
2025-07-13 13:27:35.218 /pillar/cmd/zedagent/handleconfig.go:695: getconfig: stat /persist/checkpoint/lastconfig: no such file or directory
2025-07-13 13:27:35.221 /pillar/cmd/zedagent/zedagent.go:1995: Triggered PublishDeviceInfo
2025-07-13 13:27:35.223 /pillar/cmd/zedagent/zedagent.go:2002: Failed to send on PublishDeviceInfo
2025-07-13 13:27:35.226 /pillar/cmd/zedagent/localinfo.go:568: Triggering POST for /api/v1/devinfo to local server
2025-07-13 13:27:35.237 /pillar/types/zedagenttypes.go:606: Zedagent status modify
2025-07-13 13:27:35.241 /pillar/pubsub/checkmaxtime.go:84: RegisterFileWatchdog(zedagentconfig)
2025-07-13 13:27:35.244 /pillar/cmd/zedagent/localinfo.go:583: localDevInfoPOSTTask: waiting for localDevInfoPOSTTicker done
2025-07-13 13:27:35.244 /pillar/cmd/zedagent/localinfo.go:568: Triggering POST for /api/v1/devinfo to local server
2025-07-13 13:27:35.248 /pillar/pubsub/checkmaxtime.go:84: RegisterFileWatchdog(zedagent-localdevinfo)
2025-07-13 13:27:35.276 /pillar/cmd/zedagent/zedagent.go:527: Creating localProfileTimerTask at goroutine:
github.com/lf-edge/eve/pkg/pillar/cmd/zedagent.Run()

rucoder avatar Jul 13 '25 15:07 rucoder

@OhmSpectator please merge it ASAP. i do not see any point of waiting for flaky tests to pass

rucoder avatar Jul 13 '25 16:07 rucoder