kairos icon indicating copy to clipboard operation
kairos copied to clipboard

Consider replacing Auroraboot for netbooting

Open jimmykarily opened this issue 1 year ago • 5 comments

The golang library we use in Auroraboot to implement netbooting is no longer maintained: https://github.com/danderson/netboot

It already doesn't work with some devices (e.g. ASUS PN64). We should consider some better maintained alternatives.

netboot.xyz seems like a good candidate. Other projects are already in the list: https://netboot.xyz/docs/faq#what-operating-systems-are-currently-available-on-netbootxyz

By default this needs internet access but there is a way to self host it to: https://netboot.xyz/docs/selfhosting#deploying-with-docker

We should give this a spin and see if it's a viable option that can replace auroraboot for the netbooting part. If yes, then there is less things auroraboot needs to implement, which will help us consolidate in less tools in the future (instead of all , osbuilder, auroraboot, enki etc)

Even if it doesn't work locally, we should still consider adding Kairos in the supported OSes.

jimmykarily avatar Oct 09 '24 07:10 jimmykarily

Turns out netboot.xyz only implements a better UX for the client but not the server side. There are other candidates like this one: https://github.com/insomniacslk/dhcp

which I tried to use and almost work but on my ASUS it still fails with sending block 0: code=8, error: User aborted the transfer which indicates that maybe the issue is with this specific hardware.

For reference this is the code I used:

package main

import (
	"fmt"
	"io"
	"log"
	"net"
	"os"

	"github.com/insomniacslk/dhcp/dhcpv4"
	"github.com/insomniacslk/dhcp/dhcpv4/server4"
	"github.com/pin/tftp"
)

func main() {
	// Run the TFTP server in a separate goroutine
	go startTFTPServer()

	// Start the DHCP server
	startDHCPServer()
}

// startDHCPServer initializes and starts the DHCP server
func startDHCPServer() {
	handler := func(conn net.PacketConn, peer net.Addr, pkt *dhcpv4.DHCPv4) {
		fmt.Printf("pkt = %+v\n", *pkt)

		// Check if it's a DHCP Discover or Request
		if pkt.MessageType() != dhcpv4.MessageTypeDiscover && pkt.MessageType() != dhcpv4.MessageTypeRequest {
			return
		}

		// Create a reply based on the request packet
		resp, err := dhcpv4.NewReplyFromRequest(pkt)
		if err != nil {
			log.Printf("failed to create DHCP reply: %v", err)
			return
		}

		// Define the IP address of the TFTP server
		tftpServerIP := net.IP{192, 168, 1, 36}
		//tftpServerIP := net.IP{192, 168, 122, 1}

		// Set DHCP options for netboot
		resp.Options.Update(dhcpv4.OptBootFileName("kairos.ipxe"))
		resp.Options.Update(dhcpv4.OptServerIdentifier(tftpServerIP))
		resp.Options.Update(dhcpv4.OptTFTPServerName(tftpServerIP.String()))
		//resp.Options.Update(dhcpv4.Option{Code: dhcpv4.OptionDHCPMessageType, Value: dhcpv4.MessageTypeOffer})

		// Optionally set additional options
		//resp.Options.Update(dhcpv4.OptIPAddressLeaseTime(3600 * time.Second)) // Lease time of 1 hour

		fmt.Printf("resp.Options = %+v\n", resp.Options)
		//fmt.Printf("resp = %+v\n", resp)

		// Send the response back to the client
		if _, err := conn.WriteTo(resp.ToBytes(), peer); err != nil {
			log.Printf("failed to send DHCP response: %v", err)
		}
	}

	//iface := "enp121s0" // Replace with the actual network interface name
	iface := "" // Replace with the actual network interface name
	srv, err := server4.NewServer(iface, nil, handler)
	if err != nil {
		log.Fatalf("failed to create DHCP server: %v", err)
	}

	log.Printf("Starting DHCP server on interface %s...", iface)
	if err := srv.Serve(); err != nil {
		log.Fatalf("failed to serve DHCP: %v", err)
	}
}

// startTFTPServer initializes and starts the TFTP server
func startTFTPServer() {
	// Define the TFTP server
	srv := tftp.NewServer(readHandler, nil)

	// Start the TFTP server on port 69
	go func() {
		if err := srv.ListenAndServe(":69"); err != nil {
			log.Fatalf("failed to start TFTP server: %v", err)
		}
	}()

	log.Println("TFTP server started on port 69...")
}

// readHandler serves files requested by TFTP clients
func readHandler(filename string, rf io.ReaderFrom) error {
	fmt.Printf("Reading file %s\n", filename)
	// Path to the directory where TFTP boot files are stored
	filePath := fmt.Sprintf("./%s", filename)

	// Open the requested file
	file, err := os.Open(filePath)
	if err != nil {
		log.Printf("failed to open file %s: %v", filePath, err)
		return err
	}
	defer file.Close()

	// Use the io.ReaderFrom interface to transfer the file
	log.Printf("Serving file %s", filename)
	if _, err := rf.ReadFrom(file); err != nil {
		log.Printf("failed to serve file %s: %v", filename, err)
		return err
	}

	return nil
}

with kairos.ipxe being this file: https://github.com/kairos-io/kairos/releases/download/v3.2.1/kairos-alpine-3.19-core-amd64-generic-v3.2.1.ipxe

I'm not sure if it's worth digging into this more. I would rather not maintain a pxe boot server if we can find something that works out of the box by simply providing an ipxie script.

jimmykarily avatar Oct 09 '24 10:10 jimmykarily

netboot provides a very specific functionality at the moment, which is to work aside with an already-existing dhcp server on the same network. It's hard to replace - maybe we can contact the maintainer and see if there is someway to keep it up-to-date by the community?

Maybe we can just fade it out and "keep as is" and leverage things like UEFI HTTP boot. However same functionalities in terms of UX (specify a container image and 'boot') is hard to replicate

mudler avatar Oct 09 '24 14:10 mudler

Let decide what Auroraboot is responsible for here first: https://github.com/kairos-io/kairos/issues/1633 and then we can discuss this again.

jimmykarily avatar Oct 14 '24 09:10 jimmykarily

@jimmykarily did you test in a recently aurora version? With the latest changes it migth now work!

Itxaka avatar Jan 14 '25 11:01 Itxaka

@jimmykarily did you test in a recently aurora version? With the latest changes it migth now work!

The problem was in the danderson/netboot library iirc and that didn't receive any changes recently. Is there any other change that we expect to have fixed that? I might give it a go again when we start working on this ticket.

jimmykarily avatar Jan 20 '25 10:01 jimmykarily

not sure if this is valid. We took over the netboot lib, updated the ipxe deps, added more functionality, bumped deps and so on.

So Im not sure if we should just close it. Though @mudler ?

Itxaka avatar Jul 17 '25 15:07 Itxaka

Indeed, It's fine for me to close this one. We have taken what was required to have it on par - even if we don't expand it's support its something that can be considered case-by-case.

mudler avatar Jul 17 '25 16:07 mudler