dasharo-issues icon indicating copy to clipboard operation
dasharo-issues copied to clipboard

Booting DTS v2.0.0 through iPXE has no internet

Open WiktorG351 opened this issue 1 year ago • 28 comments

Component

Dasharo Tools Suite

Device

NovaCustom V54 14th Gen, NovaCustom V56 14th Gen

Dasharo version

v0.9.0-rc7

Dasharo Tools Suite version

v2.0.0

Test case ID

DTS007.001

Brief summary

When booting DTS v2.0.0 through iPXE on a Novacustom V54/V56 laptop, the internet doesn't work. It's assumed to be some sort of driver issue, here are logs from dmesg

How reproducible

always

How to reproduce

boot DTS v2.0.0 through iPXE on a Novacustom V54/V56 with Dasharo v0.9.0-rc7

Expected behavior

we have internet access through the ethernet port

Actual behavior

we don't have internet access

Screenshots

No response

Additional context

No response

Solutions you've tried

No response

WiktorG351 avatar Nov 18 '24 10:11 WiktorG351

maybe the GbE region got wiped? That would result in ethernet not working

mkopec avatar Jan 09 '25 11:01 mkopec

Well I'm not exactly sure about the GbE region on that particular platform. But when booting DTS v2.0.0 through USB, the internet works just fine. It's when we boot through iPXE that we get the issue. Surely this can't be caused by a firmware issue?

WiktorG351 avatar Jan 09 '25 13:01 WiktorG351

@mkopec

Surely this can't be caused by a firmware issue?

Do you have an educated guess here?

But when booting DTS v2.0.0 through USB, the internet works just fine. It's when we boot through iPXE that we get the issue.

Have we tried a more recent version as well? We have another report that the problem is present in the latest DTS version as well.

macpijan avatar Jan 20 '25 12:01 macpijan

Do you have an educated guess here?

Could be something related to the broken reset issue in i219, maybe Linux is having trouble taking over the adapter after iPXE? From the linked dmesg:

[   20.015245] e1000e 0000:00:1f.6 eno0: Reset adapter unexpectedly
[   23.831647] e1000e 0000:00:1f.6 eno0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   25.776717] e1000e 0000:00:1f.6 eno0: Detected Hardware Unit Hang:

looks pretty suspicious

mkopec avatar Jan 20 '25 12:01 mkopec

The issue also occurs on DTS v2.1.3. Could be a driver issue. What network driver are we using?

PLangowski avatar Jan 29 '25 13:01 PLangowski

@PLangowski We're using the e1000e driver

mkopec avatar Feb 13 '25 11:02 mkopec

I will look into this issue next week. Here are some initial ideas on how to diagnose the problem:

  • Check if there is a newer driver version, which tackles this issue
  • Check for similar problems online
  • Might be related to ASPM

PLangowski avatar Feb 13 '25 12:02 PLangowski

I tried the following solutions, which i found online. They did not solve the issue:

  • kernel config: aspm default policy: performance
  • kernel parameter: pcie_aspm=off
  • ethtool -K eno0 tso off gso off gro off (disable TCP segmentation)
  • ethtool -K eth0 tx off rx off

I'm now trying to bump the kernel version to 6.12 (without bumping poky revision for now).

PLangowski avatar Feb 18 '25 16:02 PLangowski

Unfortunately, bumping the kernel version did not solve the problem. I'm running out of ideas.

PLangowski avatar Feb 19 '25 14:02 PLangowski

FWIW I do network boot on V54/V56 and have working network. But my boot chain is very different: PXE -> grub2.efi -> Xen -> Linux. And the Xen / Linux builds are taken from Qubes OS (kernel-latest).

marmarek avatar Feb 19 '25 14:02 marmarek

This issue is still happening, it looks like it's a DTS bug.

CC: @macpijan

wessel-novacustom avatar Feb 20 '25 08:02 wessel-novacustom

FYI: Internet is working fine when connecting a USB Ethernet device.

wessel-novacustom avatar Feb 20 '25 08:02 wessel-novacustom

I checked the difference in some files/logs between USB and iPXE boot. There are some interesting finds from ethtool:

ethtool eno0:

usb: PHYAD: 1

ipxe: PHYAD: 2

ethtool -d eno0 (diff usb ipxe):

37,38c37,38
< 0x02810: RDH   (Receive desc head)       0x00000018
< 0x02818: RDT   (Receive desc tail)       0x00000010
---
> 0x02810: RDH   (Receive desc head)       0x00000000
> 0x02818: RDT   (Receive desc tail)       0x000000F0
46,47c46,47
< 0x03810: TDH   (Transmit desc head)      0x0000004E
< 0x03818: TDT   (Transmit desc tail)      0x0000004E
---
> 0x03810: TDH   (Transmit desc head)      0x00000000
> 0x03818: TDT   (Transmit desc tail)      0x00000001

This could potentially be the root of the problem.

Tested on V540TU.

I'm attaching full logs, diffs and script for getting the logs.

logs-ipxe.tar.gz

logs-usb.tar.gz

diffs.tar.gz

collect-logs.txt

PLangowski avatar Feb 20 '25 09:02 PLangowski

I tried booting another system through netboot and found the following issue:

In the initial iPXE shell when I run dhcp, the interface is configured correctly. Then, when i chain netboot, it loads and attempts to configure the interface again, but this time it fails:

Configuring (...) No configuration methods succeeded

Afterwards, dhcp in iPXE fails and the platform needs to be rebooted in order to get it working.

It looks like the network interface becomes unavailable right after ipxe boots an image.

PLangowski avatar Feb 20 '25 11:02 PLangowski

I managed to boot tinycore via ipxe directly (http://tinycorelinux.net/14.x/x86_64/release/distribution_files/). The problem still occurs there (detected hardware unit hang). I think the issue is related to iPXE and not DTS.

PLangowski avatar Feb 20 '25 11:02 PLangowski

FWIW I do network boot on V54/V56 and have working network. But my boot chain is very different: PXE -> grub2.efi -> Xen -> Linux. And the Xen / Linux builds are taken from Qubes OS (kernel-latest).

With this setup, ethtool -d eno0 reports:

MAC Registers
-------------
0x00000: CTRL (Device control register)  0x58180240
      Endian mode (buffers):             little
      Link reset:                        normal
      Set link up:                       1
      Invert Loss-Of-Signal:             no
      Receive flow control:              enabled
      Transmit flow control:             enabled
      VLAN mode:                         enabled
      Auto speed detect:                 disabled
      Speed select:                      1000Mb/s
      Force speed:                       no
      Force duplex:                      no
0x00008: STATUS (Device status register) 0x40080083
      Duplex:                            full
      Link up:                           link config
      TBI mode:                          disabled
      Link speed:                        1000Mb/s
      Bus type:                          PCI
      Bus speed:                         33MHz
      Bus width:                         32-bit
0x00100: RCTL (Receive control register) 0x04008002
      Receiver:                          enabled
      Store bad packets:                 disabled
      Unicast promiscuous:               disabled
      Multicast promiscuous:             disabled
      Long packet:                       disabled
      Descriptor minimum threshold size: 1/2
      Broadcast accept mode:             accept
      VLAN filter:                       disabled
      Canonical form indicator:          disabled
      Discard pause frames:              filtered
      Pass MAC control frames:           don't pass
      Receive buffer size:               2048
0x02808: RDLEN (Receive desc length)     0x00001000
0x02810: RDH   (Receive desc head)       0x00000077
0x02818: RDT   (Receive desc tail)       0x00000070
0x02820: RDTR  (Receive delay timer)     0x00000000
0x00400: TCTL (Transmit ctrl register)   0x3103F0FA
      Transmitter:                       enabled
      Pad short packets:                 enabled
      Software XOFF Transmission:        disabled
      Re-transmit on late collision:     enabled
0x03808: TDLEN (Transmit desc length)    0x00001000
0x03810: TDH   (Transmit desc head)      0x000000B4
0x03818: TDT   (Transmit desc tail)      0x000000B4
0x03820: TIDV  (Transmit delay timer)    0x00000008
PHY type:                                unknown

And without Xen (PXE -> grub2.efi -> Linux (also network works):

MAC Registers
-------------
0x00000: CTRL (Device control register)  0x58180240
      Endian mode (buffers):             little
      Link reset:                        normal
      Set link up:                       1
      Invert Loss-Of-Signal:             no
      Receive flow control:              enabled
      Transmit flow control:             enabled
      VLAN mode:                         enabled
      Auto speed detect:                 disabled
      Speed select:                      1000Mb/s
      Force speed:                       no
      Force duplex:                      no
0x00008: STATUS (Device status register) 0x40080083
      Duplex:                            full
      Link up:                           link config
      TBI mode:                          disabled
      Link speed:                        1000Mb/s
      Bus type:                          PCI
      Bus speed:                         33MHz
      Bus width:                         32-bit
0x00100: RCTL (Receive control register) 0x04008002
      Receiver:                          enabled
      Store bad packets:                 disabled
      Unicast promiscuous:               disabled
      Multicast promiscuous:             disabled
      Long packet:                       disabled
      Descriptor minimum threshold size: 1/2
      Broadcast accept mode:             accept
      VLAN filter:                       disabled
      Canonical form indicator:          disabled
      Discard pause frames:              filtered
      Pass MAC control frames:           don't pass
      Receive buffer size:               2048
0x02808: RDLEN (Receive desc length)     0x00001000
0x02810: RDH   (Receive desc head)       0x00000017
0x02818: RDT   (Receive desc tail)       0x00000010
0x02820: RDTR  (Receive delay timer)     0x00000000
0x00400: TCTL (Transmit ctrl register)   0x3103F0FA
      Transmitter:                       enabled
      Pad short packets:                 enabled
      Software XOFF Transmission:        disabled
      Re-transmit on late collision:     enabled
0x03808: TDLEN (Transmit desc length)    0x00001000
0x03810: TDH   (Transmit desc head)      0x000000E4
0x03818: TDT   (Transmit desc tail)      0x000000E4
0x03820: TIDV  (Transmit delay timer)    0x00000008
PHY type:                                unknown

I tried the same without Grub, but it fails the same way as DTS. I guess grub is doing some fixup?

marmarek avatar Feb 20 '25 12:02 marmarek

Could it be maybe related to broken FLR on this card? We have a workaround for this in Linux: https://github.com/QubesOS/qubes-linux-kernel/blob/main/0001-PCI-add-a-reset-quirk-for-Intel-I219LM-ethernet-adap.patch, maybe Grub is doing something similar?

marmarek avatar Feb 20 '25 12:02 marmarek

I'm currently seeing this issue as well on my NovaCustom V56 14th Gen

jshirkey avatar Feb 21 '25 05:02 jshirkey

I tried to revert iPXE revsision to 63ed3e352f2e65b26b15a847961440cb4dad0318 and add PXE_ROM_ID to mainboard Kconfig but none of it worked, still no network interface visible in iPXE shell.

matmacieje avatar Feb 26 '25 10:02 matmacieje

Fixed as of https://github.com/Dasharo/coreboot/pull/624#issuecomment-2684733985 - closing.

mkopec avatar Feb 26 '25 15:02 mkopec

Just please keep in mind this was not a problem in DTS, but with firmware. So it will need a new firmware release to work correctly.

macpijan avatar Feb 27 '25 09:02 macpijan

Just please keep in mind this was not a problem in DTS, but with firmware. So it will need a new firmware release to work correctly.

That's quite sad, honestly. How would people be able to update their firmware through iPXE? I think we would need a fix or workaround in DTS.

wessel-novacustom avatar Feb 27 '25 10:02 wessel-novacustom

That's quite sad, honestly. How would people be able to update their firmware through iPXE? I think we would need a fix or workaround in DTS.

We'll try to find a workaround in DTS. I'm going to try to find some way to reset the GbE adapter correctly from the state that iPXE currently leaves it in.

mkopec avatar Mar 04 '25 15:03 mkopec

Reopening this issue until then.

mkopec avatar Mar 04 '25 15:03 mkopec

Here is serial console output caught on NovaCustom V540TU, iPXE network device initialization at the end.

ipxe.log

matmacieje avatar Mar 21 '25 14:03 matmacieje

I was not able to find a suitable workaround in DTS for this issue. I've added a disclaimer to the docs informing that the next update must be isntalled manually: https://github.com/Dasharo/docs/pull/1041/files

@wessel-novacustom

mkopec avatar Apr 04 '25 09:04 mkopec

I was not able to find a suitable workaround in DTS for this issue. I've added a disclaimer to the docs informing that the next update must be isntalled manually: https://github.com/Dasharo/docs/pull/1041/files

@wessel-novacustom

Thank you for letting me know.

This is quite inconvenient, especially for users with a lower computer knowledge level.

Will an fwupd update work when updating from V560TU v0.9.0?

Otherwise, do you confirm that we can assist our customers with sending them a USB pen drive with DTS in order to still be able to proceed with an update?

Guided firmware update should still be possible when booted via USB, right?

User experience is important. CC: @pietrushnic

wessel-novacustom avatar Apr 04 '25 11:04 wessel-novacustom

@wessel-novacustom let me try to assign correct priority internally.

pietrushnic avatar Apr 07 '25 07:04 pietrushnic

@marmarek could you please share your working config with chainloading grub2? From what I understand, grub has no http support by default.

168-b07-1f6-176e87133 avatar Jul 18 '25 06:07 168-b07-1f6-176e87133

I understand, grub has no http support by default.

Mine has :) I built my grub2.efi with all modules included:

modules=$(ls grub-core |sed -n 's/.mod$//gp')
./grub-mkimage -d ./grub-core --format=x86_64-efi -o grub2.efi -p '(tftp)/grub2-efi' $modules

The (tftp)/grub2-efi argument is where it will look for grub.cfg, you can also embed it via memdisk.

And then the config looks like this:

set net_default_server=192.168.1.2
multiboot2 (http)/qinstall/iso/images/pxeboot/xen.gz placeholder smt=off
module2 (http)/qinstall/iso/images/pxeboot/vmlinuz-latest inst.repo=http://192.168.1.2/qinstall/iso plymouth.ignore-serial-consoles inst.sshd inst.ks=http://192.168.1.2/qinstall/ks.cfg
module2 --nounzip (http)/qinstall/iso/images/pxeboot/initrd-latest.img

(this one is for starting qubes installer)

As for chainloading grub2.efi from iPXE, it's basically this:

#!ipxe

kernel http://192.168.1.2/grub2-efi/grub2.efi
boot

marmarek avatar Jul 18 '25 09:07 marmarek