berryboot icon indicating copy to clipboard operation
berryboot copied to clipboard

After berryboot upgrade - iscsi does not longer find partitions

Open h4de5 opened this issue 4 years ago • 43 comments

on rpi3b, I upgraded berryboot from berryboot-20190612-pi0-pi1-pi2-pi3.zip to berryboot-20201103-pi4.zip I extracted the shared.img into mnt/shared

but now it says:

iSCSI target does not have any partitions

image

If I click ok, it starts searching and after +10seconds it does find the existing partitions and i can boot from them.

so the problem is, that I need to vnc-connect me to those headless installations and click away the message after each restart.

h4de5 avatar Dec 05 '20 13:12 h4de5

this is still an issue - I can't find a reason why this happens, but it shows up on a berryboot setup on rpi3 and on a rpi4.

h4de5 avatar Jan 26 '21 08:01 h4de5

I'm having exactly the same issue. Rpi4 connecting through a Cisco 2960L switch to a Synology iscsi. Portfast enabled on switch. Any similarities with your setup?

dobber81 avatar Jan 31 '21 20:01 dobber81

That seems to be a bug introduced in https://github.com/maxnet/berryboot/commit/9e129fd97e582db48b8a3af7aca56af21a4596f7 - see how in startISCSI, it tries to mount system partition before starting network? I believe that's wrong of your system partition is on the network :-)

HinTak avatar Jan 31 '21 21:01 HinTak

Great find! Is there an easy fix? Or do we have to wait for the grand coder to fix it? Not much activity here recently.

dobber81 avatar Jan 31 '21 21:01 dobber81

I have been making new builds at https://github.com/HinTak/berryboot (see under the releases part), and the process/notes are at

https://github.com/HinTak/RaspberryPi-Dev/blob/master/Customizing-Berryboot.md

I suppose I'd be happy to roll some relevant changes in the next time I make a build. Before that, it would be nice for somebody to look at the code and tell me what the change before/after is supposed to do. It is obviously much more than just "basic pi4 support". I have no experience with iscsi - and I don't really use berryboot myself. I just have a vague understanding that iscsi is some kind of booting from the network, and Berryboot is essentially a customized linux kernel using the rest of another OS (after it is setup).

HinTak avatar Jan 31 '21 22:01 HinTak

If you do find the time to make some tests I would be happy to try it out. The pi I'm using is not one of my production boards.

I found that when iscsi is used, berryboot creates a iscsi.sh file with on line "/sbin/iscsistart -i "PiTest" -g 1 -t "pi" -a "10.0.0.10"" Just for fun I added sleep 10 above that line, and that resulted in the error message delaying for 10 seconds. Any chance of starting the network by using some nice one-liner in that file? I tried ifup eth0 and ifconfig eth0 up, but without success.

dobber81 avatar Jan 31 '21 22:01 dobber81

Telling me about that line is definitely useful - I looked up what iscsistart does, and I am afraid that's the minimum. "ifup eth0" (or equivalent) happens before and separate from iscsistart. As in any netboot systems, the tools/commands you use at the earliest stage of booting are more limited and different from a fully-up-and-running OS.

HinTak avatar Jan 31 '21 22:01 HinTak

I looked at the code, and couldn't quite see why moving the mountSystemPartition call causes delay though, except perhaps it tests for isPxeBoot, which I needs to find the code...

HinTak avatar Feb 01 '21 00:02 HinTak

@HinTak - great to see you in action again :)

I must say I have no idea about iscsi, that code or how to compile it - but this line in particular:

https://github.com/maxnet/berryboot/commit/9e129fd97e582db48b8a3af7aca56af21a4596f7#diff-5143ae91cacdea797f34c4aa62e2fafcf5495c9bf6ed5111a871b358f0317df9R468

they moved the mount system partition before loadModule("iscsi_tcp"); - again I don't know what anything of this does, but from the error I got, this seems to be a problem. It can't mount any partitions on the first run, but after that on the next run, the module is loaded and mountSystemPartition works right away.

next question: as I can't compile that myself - is there a way I can activate that module from the outside? like through the config.txt? or is this the very wrong idea? 🥟

h4de5 avatar Feb 01 '21 07:02 h4de5

I can't quite see how moving that mountSystemPartition line earlier causes problem (moving it later possibly could) but it is relatively simple just to move it to the older position and see. It takes about 3.5 hours on my hardware to build, but it is just waiting and not much attention from me needed.

I think some of the scripts in the /boot directory (where config.txt is) is run, but I don't know if it is early enough to matter.

HinTak avatar Feb 01 '21 11:02 HinTak

Haven't had time to look into the iSCSI/USB drive stuff.

But I wouldn't worry about things like this:

they moved the mount system partition before loadModule("iscsi_tcp");

system partition = the FAT partition of the SD card The module it needs to load lives on that FAT partition. So yes, you do want to mount it, before you can load the module that is needed to make an iSCSI connection.

It is only not necessary if the module is already built into the kernel. Which may or may not have been the case in the past.

maxnet avatar Feb 01 '21 16:02 maxnet

The move shouldn't matter to iscsi, since the two are somewhat independent, especially it is moving earlier, rather than later. I wonder if it is a timing issue - obviously it takes time to mount, maybe the extra time of mounting later (in the previous case) after starting network is useful somehow for the network config to "settle" in some obscure way for iscsi to work.

HinTak avatar Feb 01 '21 17:02 HinTak

On my system, this appears to be a timing problem - after starting iscsi, it takes some time for the device files to be created. If I add the following lines to the end of /boot/iscsi.sh:

RV=$? /bin/sleep 5 exit $RV

It gives enough time for the device files to be created and then the data partition is found. This is not meant to be a 'fix' but a workaround during my testing. The real fix should be incorporated into bootmenudialog.cpp (BootMenuDialog::startISCSI) I believe.

owl770 avatar Feb 05 '21 22:02 owl770

And it appears that this does not work 100% reliably. May take a few reboots.

owl770 avatar Feb 05 '21 23:02 owl770

That sounds plausible - if iscsi involves creation of dev files, mounting system partitions likely have the side effect of sync'ing and flushing those dev files.

HinTak avatar Feb 06 '21 00:02 HinTak

Further testing has shown that it seems to take the iscsi daemon a while to start. My (redacted) /boot/iscsi.sh now looks like:

/bin/date >>/iscsi.out
PRE=`/bin/ls -1 /dev/sd? | /usr/bin/wc -l`
/sbin/iscsistart -i "iqn.xxxx-xx.com.berryboot:myPi" -g 1 -t "iqn.xxxx-xx.com.synology:nas.Target-1.xxxxxxxxx" -a "192.168.x.x" >>/iscsi.out 2>&1
RV=$?
if [ $RV -eq 0 ]
then
	RV=1
	COUNT=1
	while [ $COUNT -le 10 ];
	do
		POST=`/bin/ls -1 /dev/sd? | /usr/bin/wc -l`
		echo "Attempt #$COUNT: There were $PRE sd devices and there are now $POST" >>/iscsi.out
		if [ $POST -gt $PRE ];
		then
			echo "Found the iscsi device on attempt #$COUNT" >>/iscsi.out
			RV=0
			break
		fi
		/bin/sleep 1
		COUNT=`/usr/bin/expr $COUNT + 1`
	done
fi
exit $RV

and after booting, /iscsi.out contains:

Thu Jan  1 00:00:11 UTC 1970
iscsistart: can not connect to iSCSI daemon (111)!
iscsistart: version 2.0-873
iscsistart: conn 0 login rejected: initiator error (02/06)
iscsistart: Connection1:0 to [target: iqn.xxxx-xx.com.synology:nas.Target-1.xxxxxxxxxx, portal: 192.168.x.x,3260] through [iface: default] is shutdown.
iscsistart: initiator reported error (19 - encountered non-retryable iSCSI login failure)
iscsistart: Logging into iqn.xxxx-xx.com.synology:nas.Target-1.xxxxxxxxxx 192.168.x.x:3260,1
Thu Jan  1 00:00:20 UTC 1970
iscsistart: can not connect to iSCSI daemon (111)!
iscsistart: version 2.0-873
iscsistart: Connection2:0 to [target: iqn.xxxx-xx.com.synology:nas.Target-1.xxxxxxxxxx, portal: 192.168.x.x,3260] through [iface: default] is operational now
iscsistart: Logging into iqn.xxxx-xx.com.synology:nas.Target-1.xxxxxxxxxx 192.168.x.x:3260,1
Attempt #1: There were 1 sd devices and there are now 1
Attempt #2: There were 1 sd devices and there are now 2
Found the iscsi device on attempt #2

The first time iscsistart runs, it fails and the status is "shutdown". The second time it runs, it works OK but it does take a while for the /dev/sd?1 file to be created.

This seems to now works consistently, although it does take two goes at running iscsistart for it to succeed.

Once again, I am not suggesting this is a fix but it is a workaround for me (for now) and hopefully it gives @HinTak something to work on.

owl770 avatar Feb 06 '21 03:02 owl770

I can confirm that workaround does work on a rpi4 with berryboot+iscsi+libreelec and a rpi3 berryboot+iscsi+volumio.

h4de5 avatar Feb 06 '21 15:02 h4de5

New build - https://github.com/HinTak/berryboot/tree/berryboot-20210206-pi64%2Brespeaker - this moves the mountSystemPartition() BootMenuDialog::startISCSI() back to where it was. If this works better, then we can try to figure out why :-).

Upstream raspbian had released a 5.10-based kernel package last week for the first time (the bulk of last years was 5.4, then before that, 4.19), so I updated to that too. That turned out to be less straight-forward than I thought, as aufs and a few other things need to be updated too. Anyway, if any of you find it useful, please click donate at https://hintak.github.io .

HinTak avatar Feb 07 '21 12:02 HinTak

I am currently compiling this version to test...will take a day or two...will report back.

owl770 avatar Feb 11 '21 02:02 owl770

I have completed this round of testing. I compiled the https://github.com/HinTak/berryboot/tree/berryboot-20210206-pi64%2Brespeaker source, copied the output files to an SD card and booted.

  • The "Welcome" screen was displayed
  • I clicked OK
  • The "Disk selection" screen was displayed
  • I selected "Networked storage (iSCSI SAN)" and "use existing files"
  • I clicked Continue
  • The "iSCSI" screen was displayed
  • I entered the iqn and server IP address and clicked OK
  • The "Connecting to iSCSI server" screen was displayed
  • The "Error - No existing Berryboot installation found on this drive" screen was displayed
  • I clicked OK
  • The "Disk selection" screen was displayed but this time the drive sdb: iSCSI storage was in the list. So this means that we have attached the iSCSI LUN and the /dev/sdb? devices files were created (reinforcing the suspected timing issue)
  • I then stopped. I copied my looping version of iscsi.sh to the SD card and edited cmdline.txt to include datadev=iscsi
  • I then booted off that SD card and it worked OK.

From what I can see, we have the issue whereby /sbin/iscsistart returns and Berryboot checks for the associated device files before they have been created by the kernel. I could be wrong but I don't think the issue is related to the placement of mountSystemPartition().

owl770 avatar Feb 12 '21 06:02 owl770

Hang on - during set up you can see sdb - but why is it not immediately visible in regular usage? Is there reconnection / time attempts during setup?

HinTak avatar Feb 12 '21 08:02 HinTak

During setup (in iscsidialog.cpp) iSCSIDialog::accept() does a call to iscsistart (which works) and then in diskdialog.cpp it calls hasExistingBerryboot but this fails (I assume the device files are not yet created). This forces the "No existing Berryboot installation found on this drive" message. By the time you click OK and it calls populateDriveList again, the device files exist and are listed.

Maybe a possible solution would be: each time /sbin/iscsistart is called, berryboot then waits until the /dev/iscsi device exists, before continuing?

Just did more research and perhaps /dev/iscsi is actually created by berryboot (it's a symbolic link to /dev/sdb?)...not the right file to wait for. Maybe the symlink we should be waiting for is /dev/disk/by-label/berryboot???

owl770 avatar Feb 12 '21 09:02 owl770

The code already does a number of re-tries... Maybe it needs to re-try smarter.

HinTak avatar Feb 12 '21 09:02 HinTak

Hi @maxnet and @HinTak - I have done a fair bit of debugging on the timing issue with iSCSI devices and provide the following code upgrades for your review, adjustment, rejection and/or acceptance. I did not want to put them in a pull request as my knowledge of berryboot is limited and I would not be surprised if the suggested changes unintentionally break something.

  1. bootmenudialog.cpp - bootmenudialog.cpp.patch.txt - these changes wait until _i->iscsiDevice() is not EMPTY before proceeding with detection of the data partition.
  2. iscsidialog.cpp - iscsidialog.cpp.patch.txt - these changes, after iscsistart has run, wait until more block devices exist (more files in /sys/class/block) before proceeding. It also times out after 20 secs.
  3. iscsidialog.h - iscsidialog.h.patch.txt - function definition for countBlockDevices().
  4. diskdialog.cpp - diskdialog.cpp.patch.txt - these changes fix an issue during a fresh install, whereby if you selected "Networked storage" and "use existing files" (so that you didn't clobber your existing iSCSI filesystems), the install would fail with "No existing berryboot installation found on this drive". Please note that this is a separate issue from the "block device creation timing" issue.
  5. diskdialog.h - diskdialog.h.patch.txt - function definition for getDeviceByLabel.

You may not like my coding style (I am no expert) or feel there are better ways to address these issues but at least it's a start.

owl770 avatar Feb 16 '21 05:02 owl770

Heya owl770,

would love to test your patches, but it seems there is something wrong with them. The point 5 "diskdialog.h" contains iscsidialog.h.patch. Ist this right? iscsidialog.h.patch.txt is there 2 times, while diskdialog.h.patch.txt is missing...?

andreaz70 avatar Feb 17 '21 20:02 andreaz70

Thanks @andreaz70 - I have updated the post to include the correct patch file!!! :)

owl770 avatar Feb 17 '21 21:02 owl770

Try it out today... :) Thanks!

andreaz70 avatar Feb 18 '21 11:02 andreaz70

@owl770 I put all your diffs under https://github.com/HinTak/berryboot . Totally untested (I haven't even tried compiling) and unlikely to look until I next do a build.

Btw, it is much easier if you commit locally then export the patch. Basically you do as much change as you want, then

git commit -m "misc changes from me" -a
git format-patch -1 HEAD

And send / attach the single "*.patch" the above generates. This is how people used to use git 10-15 years ago, before github exists, and still how linux kernel people do things: you commit locally to the repo on your own hard disk, then export the patch from it and send it off by e-mail for review and/or incorporation (if the guy on the other end trusts you enough to take it on directly, with git am < this-patch-file).

HinTak avatar Feb 18 '21 18:02 HinTak

Okay. Compiled everything and it looks better then before... I was able to install one of those images to a iscsi drive. It will boot, but not completly. I still run into the issue where the RPi_OS_2020.11_Lite_[64-bit].img192 is waiting for the boot device (by label). This fails and i am in an emergency console... :(

andreaz70 avatar Feb 19 '21 16:02 andreaz70

I think you will have better luck with images from here: https://berryboot.alexgoldcheidt.com/images/

dobber81 avatar Feb 19 '21 16:02 dobber81

Hi @andreaz70 - I had the same problem with that image. I'm not 100% sure but believe it is an issue with the image rather than berryboot. As @dobber81 says, the images from alexgoldcheidt seem to be more successful. Thanks for testing the patches and glad they worked for you.

owl770 avatar Feb 19 '21 19:02 owl770

Fwiw, I don't think the other images are relevant - the problem is at the stage where the berryboot core tries to get at images (via iscsi), so any differences in the server images are irrelevant. Or that's how I undérstand the nature of the problem is. It is good to know of another source of server images though.

HinTak avatar Feb 19 '21 23:02 HinTak

Hello HinTak and owl770.

Okay, i modified the image (copied over, modified, build a new one and copied back), that you can download via berryboot (the buster lite one) with just one change in fstab.

I replaced

label=boot /boot vfat defaults 0 0 to /dev/mmcblk0p1 /boot vfat defaults 0 0

and voilla, the image is booting perfectly from iscsi. For me, it seems, the image isn't able to see a drive named "boot". Even if you rename the sdcard to "boot", it makes no change.

Thanks to dobber81, for pointing me to those images!

andreaz70 avatar Feb 20 '21 20:02 andreaz70

That's an interesting find. I'll see what the label= does when I can find some time.

HinTak avatar Feb 21 '21 14:02 HinTak

Hi @andreaz70 and @HinTak - This is a little off topic but...given that mmcblk0p1 is a vfat filesystem, the label is probably in uppercase (ie. BOOT rather than boot). It would be interesting to see if you changed the line in /etc/fstab from: label=boot /boot vfat defaults 0 0 to: label=BOOT /boot vfat defaults 0 0

and see if it can then find the device. I might give it a go and report back.

owl770 avatar Feb 22 '21 07:02 owl770

Hello HinTak and owl770.

Okay, i modified the image (copied over, modified, build a new one and copied back), that you can download via berryboot (the buster lite one) with just one change in fstab.

I replaced

label=boot /boot vfat defaults 0 0 to /dev/mmcblk0p1 /boot vfat defaults 0 0

and voilla, the image is booting perfectly from iscsi. For me, it seems, the image isn't able to see a drive named "boot". Even if you rename the sdcard to "boot", it makes no change.

Thanks to dobber81, for pointing me to those images!

I can confirm that changing LABEL=boot to LABEL=BOOT in /etc/fstab DOES NOT fix this issue. Stick with the change to /dev/mmcblk0p1

owl770 avatar Feb 22 '21 08:02 owl770

I just remembered something: does your disk partition have a disklabel at all??? Given that the stated procedure for installing berryboot seems to be just formatting to FAT by whatever means, then copy all the files in the zip.

HinTak avatar Feb 22 '21 14:02 HinTak

Berryboot labels the partition boot when it does the /sbin/mkfs.fat -n boot /dev/????????

owl770 avatar Feb 22 '21 21:02 owl770

If I add the following lines to the end of /boot/iscsi.sh:

RV=$? /bin/sleep 5 exit $RV

I hit this issue today with a custom image on a Pi 4 and above workaround fixed it for me.

MeFri avatar Feb 23 '21 12:02 MeFri

Hi @maxnet and @HinTak - I have done a fair bit of debugging on the timing issue with iSCSI devices and provide the following code upgrades for your review, adjustment, rejection and/or acceptance. I did not want to put them in a pull request as my knowledge of berryboot is limited and I would not be surprised if the suggested changes unintentionally break something.

  1. bootmenudialog.cpp - bootmenudialog.cpp.patch.txt - these changes wait until _i->iscsiDevice() is not EMPTY before proceeding with detection of the data partition.
  2. iscsidialog.cpp - iscsidialog.cpp.patch.txt - these changes, after iscsistart has run, wait until more block devices exist (more files in /sys/class/block) before proceeding. It also times out after 20 secs.
  3. iscsidialog.h - iscsidialog.h.patch.txt - function definition for countBlockDevices().
  4. diskdialog.cpp - diskdialog.cpp.patch.txt - these changes fix an issue during a fresh install, whereby if you selected "Networked storage" and "use existing files" (so that you didn't clobber your existing iSCSI filesystems), the install would fail with "No existing berryboot installation found on this drive". Please note that this is a separate issue from the "block device creation timing" issue.
  5. diskdialog.h - diskdialog.h.patch.txt - function definition for getDeviceByLabel.

You may not like my coding style (I am no expert) or feel there are better ways to address these issues but at least it's a start.

Hi @maxnet - have you had a chance to look at these patches and do they require any further work from me? Thanks.

owl770 avatar Mar 09 '21 06:03 owl770

Any update on this ?

AliceDiNunno avatar Aug 11 '21 18:08 AliceDiNunno

hello, today i must update the eeprom of my pi4 because it don't boot after a software-update. after that update, the pi boot without this iscsi-error-message. but i don't know why. but this is good.

goodby

dieterferdinand avatar Sep 11 '21 21:09 dieterferdinand

If I add the following lines to the end of /boot/iscsi.sh: RV=$? /bin/sleep 5 exit $RV

I hit this issue today with a custom image on a Pi 4 and above workaround fixed it for me.

@MeFri Thanks a lot, it works for me on Pi 2B.

atesacek avatar Aug 29 '22 11:08 atesacek