bugs icon indicating copy to clipboard operation
bugs copied to clipboard

first boot after coreos-install fails on veritysetup

Open redbaron opened this issue 6 years ago • 5 comments

Issue Report

Bug

Container Linux Version

1911.3.0

Environment

Baremetal

Actual Behavior

veritysetup fails during boot, inspecting emergency shell shows that e2size /dev/<device> returns "Success" string, not numerical value. It seems that e2size in initrd is NOT https://github.com/coreos/seismograph/blob/master/src/e2size/e2size.c

Reproduction Steps

  1. ...
  2. ...

Other Information

Same deployment procedure works fine in VMWare environment

redbaron avatar Dec 03 '18 12:12 redbaron

Thanks for the report. To the best of my knowledge we didn't change anything in this regard recently. What's the exit-code of e2size <device>? How are you installing and booting this VM? Do you have a known-good version of the same flow?

lucab avatar Dec 03 '18 13:12 lucab

we provision using iPXE, where network boot runs coreos-install and then reboots. Second boot was exactly what was failing.

I rebooted over iPXE again and it now passes verity-setup. How strange! When it was failing, I was using emergency shell to get some intel, it it was an error from verity-setup's

/sbin/veritysetup create usr --hash-offset="$(e2size /dev/disk/by-partuuid/...) ..."

when I ran e2size /dev/didk/by-partuuid/.... I got result Success

redbaron avatar Dec 03 '18 13:12 redbaron

Hm, colleague of mine suggested that this line is a culprit

https://github.com/coreos/seismograph/blob/ba4a441b18d7581514c72f53c9dcf4e90866ddbc/src/e2size/e2size.c#L31

err check was triggered, but errno wasn't set and strerror returns Success

redbaron avatar Dec 03 '18 13:12 redbaron

Yes, that was also my suspicion. An exit-code of 1 would confirm that. I'm still unsure about the root-cause of this one-off failure though. Was the same disk used for a previous Linux installation without any zero-ing in-between?

lucab avatar Dec 03 '18 13:12 lucab

An exit-code of 1 would confirm that.

because it is in a shell substitution, that exit code was ignored and it tried to run verity-setup anyway

Was the same disk used for a previous Linux installation without any zero-ing in-between?

doesn't coreos-install script erase whole disk anyway?

redbaron avatar Dec 03 '18 14:12 redbaron