dasharo-issues icon indicating copy to clipboard operation
dasharo-issues copied to clipboard

V54: BIOS settings are randomly reset

Open byteboltsec opened this issue 8 months ago • 6 comments

Component

Dasharo firmware

Device

NovaCustom V54 14th Gen

Dasharo version

v0.9.0

Dasharo Tools Suite version

No response

Test case ID

No response

Brief summary

With a new V54x_6x_TU laptop, it happened a few times now, that we recognized a reset of some important BIOS settings like secure boot, camera disable, Intel ME etc. Currently, we can't confirm that every setting got reset. We have noticed it on Ubuntu 24.04 in the security audit settings that secure boot was suddenly disabled after working with it. We can't trust the BIOS settings anymore because of the unstable state. There is definitely no long startup, which would indicate a memory re-training. Did anyone observe something like this?

How reproducible

Not reproducible / happens randomly

How to reproduce

We can't reproduce it currently.

Expected behavior

BIOS settings should be persistent.

Actual behavior

BIOS settings like secure boot or Intel ME are reset.

Screenshots

No response

Additional context

No response

Solutions you've tried

  • CMOS battery connector is stable and fit correctly
  • CMOS battery has stable voltage, checked with voltmeter
  • Reset CMOS battery according to Novacustom; After that, laptop needs more time as expected to re-train the memory

byteboltsec avatar Apr 04 '25 14:04 byteboltsec

@byteboltsec Could you please let us know what exact firmware settings were set differently when compared to the default settings? Were Early boot DMA Protection and Keep IOMMU enabled when transfer control to OS set as well?

@macpijan Any idea what could have caused the issue?

wessel-novacustom avatar Apr 07 '25 09:04 wessel-novacustom

I have seen something like this on the MTL laptops happening randomly. It looked like the whole region with variables and settings was deemed invalid and thus reinitialized with defaults. But last time I have seen it was in the first early versions of the firmware (exactly around v0.9.0 for iGPU variants). Haven't seen it on the most recent versions (including developer builds) for a couple of months.

@mkopec I believe you have seen it as well at the beginning of the MTL firmware developemnt.

miczyg1 avatar Apr 07 '25 10:04 miczyg1

Hi @wessel-novacustom , I think that a "Reset to Defaults" is triggered randomly. We changed e.g.:

  • Enable Camera -> false
  • Intel ME mode -> Disabled (GAP)
  • Battery Start Charge Threshold -> 78
  • Battery Stop Charge Threshold -> 80 But I can't really confirm/remind that Early boot DMA Protection and Keep IOMMU enabled when transfer control to OS were changed

byteboltsec avatar Apr 07 '25 10:04 byteboltsec

@miczyg1 @mkopec @macpijan Can we send test binaries of the coreboot + EDK II rebase or is that a bad idea?

wessel-novacustom avatar Apr 07 '25 12:04 wessel-novacustom

@miczyg1 @mkopec @macpijan Can we send test binaries of the coreboot + EDK II rebase or is that a bad idea?

Binaries are available all the time on CI: https://github.com/Dasharo/coreboot/actions

It is as simple as "click one workflow and download an artifact". But it is always a bad idea to experiment on dev binaries without a recovery method.

miczyg1 avatar Apr 08 '25 09:04 miczyg1

@miczyg1 @mkopec @macpijan Can we send test binaries of the coreboot + EDK II rebase or is that a bad idea?

Binaries are available all the time on CI: https://github.com/Dasharo/coreboot/actions

It is as simple as "click one workflow and download an artifact". But it is always a bad idea to experiment on dev binaries without a recovery method.

Thanks. I can see that, I have promised to apply for recovery free of charge under warranty in case of a brick.

wessel-novacustom avatar Apr 08 '25 09:04 wessel-novacustom

This issue has been reported twice now. Both customers had 2× 32 GB of internal memory. I'm not sure if this is coincidence or not, but I thought it was worth mentioning it.

wessel-novacustom avatar Jun 30 '25 06:06 wessel-novacustom

In fact, there is a known issue with the SMI/SMM storage, causing this sporadic reset. The reproducibility is low, making the issue hard to debug. Here are some related issues:

https://github.com/Dasharo/dasharo-issues/issues/1364 https://github.com/Dasharo/dasharo-issues/issues/1349 https://github.com/Dasharo/dasharo-issues/issues/1338

It was a process of understanding of what's going on.

The underlying issue for this issue and the mentioned issues should have been fixed with this PR: https://github.com/Dasharo/dasharo-issues/issues/1338#issuecomment-3102785948

wessel-novacustom avatar Jul 31 '25 11:07 wessel-novacustom

The issue is still occurring very rarely on v1.0.0-rc4.

wessel-novacustom avatar Aug 04 '25 11:08 wessel-novacustom

The reproducibility is low, making the issue hard to debug

The reproducibility is extremely low, during a whole week of testing rc4 on V540TU didn't happen even once.

wiktormowinski avatar Aug 11 '25 06:08 wiktormowinski

The issue is still occurring very rarely on v1.0.0-rc4.

@wessel-novacustom

That is unfortunate to hear. What do you mean by very rarely? Did it happen more than once during this period? Do you have any rough scenario to reproduce it, since you've been able to do so just in 3 days after the RC4 has been published it seems?

macpijan avatar Aug 12 '25 08:08 macpijan

Possibly relevant report in the matrix channel: https://matrix.to/#/!HqyQrXVyqEGAXgjatF:matrix.org/$i6F_r5YEeq37mCZ_IeQW1E4aDt9pN5QoIdsxiNHw_FM

macpijan avatar Aug 12 '25 08:08 macpijan

Just to pitch in: I've also had this issue occasionally on a V56.

Darwinkel avatar Aug 27 '25 15:08 Darwinkel

As always, binaries will be in: https://github.com/Dasharo/coreboot/actions/runs/17325122509 if someone is willing to test it on their own.

macpijan avatar Aug 29 '25 13:08 macpijan

I've ran a little stress test and haven't encountered the configuration resetting in 10 reboots. Should one happen, the serial redirection would default to disabled, and the connection would be lost.

serial_stability_log.html

philipanda avatar Sep 03 '25 07:09 philipanda

V560TU bricked after changing settings, during NBA001.001 Network Boot

  • Test enters setup menu, enables network boot
  • After reboot the no battery prompt popped up in the top left corner, not in the middle of the screen
  • Then setup menu likewise
  • Test didn't find iPXE in setup menu (despite supposedly having enabled network boot) and did a power cycle

From that moment on laptop's been bricked, EC works, laptop doesn't boot, no screen backlight. Probably halts somewhere before setting fan curve, since they're a bit louder than usual.

Happened once so far

filipleple avatar Sep 04 '25 09:09 filipleple

Reproduced again, with the same test. Binary dump:

[brick.zip](https://github.com/user-attachments/files/22136301/brick.zip

Tried to analyze with romscope:

λ ./romscope compare ../brick_copy ../dcu/working_read.rom
===== Preparation =====
Extracting file /home/flewinski/workspace/rc6testing/brick-analyze/brick_copy
Extracting file /home/flewinski/workspace/rc6testing/brick-analyze/dcu/working_read.rom
===== Comparison =====
IBG keys match.
Generating report for regions/fmap/SI_ME.bin
Generating report for regions/fmap/SMMSTORE.bin
Vblock regions/fmap/GBB.bin matches.
Vblock regions/fmap/VBLOCK_A.bin matches.
===== Conclusions =====
Not all files match. Check report for detailed information.
Report placed in folder: 'report'
λ ls report
regions-fmap-SI_ME.bin.html  regions-fmap-SMMSTORE.bin.html

I'm still able to operate on the not-working binary with dcu/cbfstool, so it looks like it's not entirely garbled

filipleple avatar Sep 04 '25 10:09 filipleple

SMMSTORE looks adequate at first glance, but since the only things different are ME region and SMMSTORE, something must be wrong with the latter (assuming ME is fine which is probably a valid assumption). Now that we write to SMMSTORE in coreboot (https://github.com/Dasharo/coreboot/pull/760), that's another suspect.

By the way, just realized there is no FaultTolerantWrite in coreboot, although it's not the issue here (of SMMSTORE would be effectively empty and settings were reset).

SergiiDmytruk avatar Sep 04 '25 13:09 SergiiDmytruk

This issue may be debuggable. After writing bricked SMMSTORE into a QEMU image and trying to run it, the boot stops after:

[...]
[Bds] Entry...
[BdsDxe] Locate Variable Policy protocol - Success
Quiet Boot disabled.
Fast Boot disabled.
Console on demand disabled.
Variable Driver Auto Update PlatformLang, PlatformLang:en, Lang:eng Status: Success

This may be the first time EDK tries to write a variable and it apparently hangs while trying.

SergiiDmytruk avatar Sep 04 '25 14:09 SergiiDmytruk

I've just managed to reproduce this manually, by enabling the non-functional DMA protection option prior to toggling Network Boot back and forth. This seems to be what the automatic full regression does in OSFV, with the DMA test right before Network Boot.

We're disabling this option for the release, so maybe that will cut off some faulty path.

filipleple avatar Sep 10 '25 13:09 filipleple

I've just managed to reproduce this manually, by enabling the non-functional DMA protection option prior to toggling Network Boot back and forth

A NUC Box user reported the same today.

Maybe good to re-enable it with a hotfix release within a few months from now? We should have a release before the Qubes summit.

wessel-novacustom avatar Sep 10 '25 13:09 wessel-novacustom

IIRC, DMA protection is currently off-limits since there problem is within the Intel FSP for Meteor Lake, and we're not legally able to redistribute modified FSP. So we must wait for Intel to fix it in a newer version.

filipleple avatar Sep 10 '25 13:09 filipleple

Ok, that's fine. Then let's integrate it again once Intel has fixed it.

wessel-novacustom avatar Sep 10 '25 13:09 wessel-novacustom

and we're not legally able to redistribute modified FSP.

Correction, we are allowed, but we do not have the latest sources available to us.

mkopec avatar Sep 10 '25 14:09 mkopec