litex-buildenv icon indicating copy to clipboard operation
litex-buildenv copied to clipboard

Do a better job of detecting if a tftp server is running and configured correctly

Open mithro opened this issue 7 years ago • 5 comments

@cr1901 hit this problem when getting set up.....

mithro avatar Jan 11 '18 06:01 mithro

The first failure was indeed due to inetd. This got lost in my scrollback:

+ case $QEMU_NETWORK in
+ '[' '!' -e /dev/net/tap0 ']'
+ echo 'Need to create and bring up a tun device, needing sudo...'
Need to create and bring up a tun device, needing sudo...
+ sudo true
+ sudo mknod /dev/net/tap0 c 10 200
+ sudo which openvpn
+ sudo openvpn --mktun --dev tap0
Thu Jan 11 00:12:53 2018 TUN/TAP device tap0 opened
Thu Jan 11 00:12:53 2018 Persist state set to: ON
++ whoami
+ sudo chown william /dev/net/tap0
+ sudo ifconfig tap0 192.168.100.100 up
+ make tftpd_start
mkdir -p build/tftpd/
sudo true
Starting aftpd
+ EXTRA_ARGS+=("-net nic -net tap,ifname=tap0,script=no,downscript=no")
+ make tftp
mkdir -p build/arty_net_or1k/
time python -u ./make.py --platform=arty --target=net --cpu-type=or1k --iprange=192.168.100 -Ob toolchain_path //opt/Xilinx/    --no-compile-gateware \
        2>&1 | tee -a /home/william/Projects/litex-buildenv/build/arty_net_or1k//output.20180111-001254.log; (exit ${PIPESTATUS[0]})
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libcompiler_rt'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libcompiler_rt'
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libbase'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libbase'
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libnet'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/libnet'
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/bios'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/bios'
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
Jan 11 00:12:54 xubuntu-dtrain atftpd[1445.140634394527488]: Advanced Trivial FTP server started (0.7)
Jan 11 00:12:54 xubuntu-dtrain atftpd[1445.140634394527488]: atftpd: can't bind port 192.168.100.100:69/udp
make[1]: Entering directory '/home/william/Projects/litex-buildenv/build/arty_net_or1k/software/uip'
make[1]: Nothing to be done for 'all'.

All subsequent failures were due to the fact that the tftpd_start target is only run once, and that's only if tap0 doesn't already exist. Perhaps if tap0 is known to exist, we should check for an existing atftpd and then attempt to invoke it if a daemon doesn't exist?

cr1901 avatar Jan 11 '18 07:01 cr1901

I think we should probably do something like trying to tftp fetch the binary and see if that works?

mithro avatar Jan 11 '18 09:01 mithro

I also encountered this issue when doing my preparation build -- the tftpd server didn't start for some reason on the first run (I had atftpd in /etc/inetd.conf from something earlier, but thought I'd disabled it early enough in the setup process), and then on subsequent runs the scripts/build-qemu.sh skipped past the part where make tftpd_start got run because the tap0 interface had already been configured. Once I figured that out, running make tftpd_start by hand got everything working.

IMHO it'd be better if the call to make tftpd_start was dependent on either:

  • whether atftpd was already running with the right TFTP directory (eg parsing ps axu looking for a process running with the right parameters); or

  • whether there was something listening on UDP/69, perhaps on the relevant IP (which would help catch "you have a TFTP server, but it's not the right one")

(plus of course that the tap0 interface configuration had worked). Ie, making the script a bit more idempotent.

At minimum given the volume of output it'd be worth trying to hint that "if TFTP fails, check that atftpd server is running" immediately before booting qemu, so it's more easily found in the console output, given that TFTP has basically no error checking.

Ewen

PS: The trouble with "try to fetch" as a test for "is it running" is that the only error reporting that TFTP has is basically a timeout, or a "not found" message; and the BIOS boot already shows the timeout behaviour...

ewenmcneill avatar Jan 14 '18 00:01 ewenmcneill

Other things to think about is the tun/tap interface setup - some people have needed things like;

sudo ip tuntap add dev tap0 mode tap user mwheeler group mwheeler

mithro avatar Jan 14 '18 02:01 mithro

Also, if you follow the instructions, do the scripts/build-qemu.sh, switch over to real hardware, and follow the instructions to delete the IP from tap0 you will be left with a tap0 interface, with no IP on it. Once you unplug the hardware board again (eg, to reset it), you'll lose the hardware interface that had 192.168.100.100 on it, which means that then make tftpd_start will fail because there's no interface with that IP on it, and scripts/build-qemu.sh will fail to configure an IP on tap0 because it assumes "tap0 exists" is the same as "everything configured". Sigh.

ewenmcneill avatar Jan 18 '18 02:01 ewenmcneill