op-build
op-build copied to clipboard
Updating the PNOR partition layout
The PNOR partition layout hasn't changed significantly since the exciting days of Palmetto. However on POWER9 platforms we've dropped "golden side" and have 64MB - 128MB of flash; in fact as far as I know all P9 platforms have 128MB flash chips but only use the defaultPnorLayout_64.xml
layout - which itself only uses about 42MB of space!
BOOTKERNEL is definitely reaching the limits of its 16MB - a number of parties are asking for new items to be included and right now the answer is "No, it doesn't fit". This includes a number of things including RAID utilities, SED support, and a particular pain point - extra font files to support not-English on VGA. I'm sure other projects are running into similar issues, but I'm obviously biased :)
So I'd like to get the discussion rolling on
- updating the PNOR layout to take advantage of the full space available
- considering adding a new partition to hold larger things such as vendor utilities, pb-plugins etc.
We have/had a couple reasons why we have stuck to 64M despite the 128M flash chips. One of them was some concern about somewhat painful changes to the pnor/sfc code to handle the larger size. I don't remember the details but I think it was related to complications related to specifying a different chip. Anyway, that concern is now gone since we've got the virtual pnor interface. We just need the BMC to support the full size and we all get it for free. :-)
I have also heard some comments in the past that we can't force bigger than 64M on our partners. Maybe Norm can weigh in (can't find his id for some reason). Adding @sannerd too.
I don't think we want to require a large flash size, but just have the option where if you have a larger one you can include certain features.
So we probably have to keep both in mind. I also think we could look through the current layout for what we've allocated versus what we use and probably cut a fair chunk out of that 42MB we currently "use" according to Sam :)
Went over the limit today just trying to add e2fsck
so this has my attention again!
Before we get into chopping and changing the partition sizes, on @dcrowell77's points;
- What does OpenBMC need to change to support these differences? Do they have any gross hard-coded sizes/addresses? Pinging.. @shenki / @geissonator / etc?
- VPNOR only exists on witherspoon - would Boston/Romulus need changes? Boston for example could be kept on 64MB if "not forcing larger" applies to them.
If the BMC doesn't hide it from us then supporting >64MB is a non-trivial hit to Hostboot. Accessing the 2nd physical chip requires using some new facilities in the AST's SFC layer.
To @stewart-ibm 's point about not requiring things, we all know how this will go down. We'll get bigger and fatter and eventually go over the 64MB limit in order to use some marketed feature and then some partner will smack us around. ;-) If we do go over 64M I think we may want to be very explicit with new optional partitions instead of expanding the existing ones. That would make it a little more obvious where the dividing lines are.
On a similar note, we (HB especially) probably need to be better about defining the required space based on features, e.g. the WOFDATA partition can be shrunk/grown based on the number of unique processor sorts you want to support in a single image, or the HBD can be shrunk if you only have a 2-chip box with a couple dimms vs a 8-chip box with a million times, etc.
Right, sticking to rearranging the 64MB layout sounds like a safer first step.
To @stewart-ibm 's point about not requiring things, we all know how this will go down. We'll get bigger and fatter and eventually go over the 64MB limit in order to use some marketed feature and then some partner will smack us around. ;-)
Oh definitely, I was choosing to ignore that particular real-world problem for the moment :D
FYI - All of the current BMC implementations (OpenBMC, SuperMicro, AMI) are using the VPNOR.
Do we mean the same thing by VPNOR? AFAIK only P9 Witherspoons use VPNOR; Romulus OpenBMC still uses regular PNOR, as does SMC's Boston (eg. pflash
works :) ). AMI's P8 machines are definitely regular PNOR.
Perhaps not, I'm referring to the fact that the Host does not have direct access to the PNOR flash but instead goes through the mbox regs to do writes/erases. That means that we don't need to know any of the flash specifics.
Ah right, yes all the P9 BMCs are using mbox for flash access. I was referring to OpenBMC's "Virtual-PNOR" abtraction which holds everything in memory, which itself isn't an issue but the current implementation has a few shortcomings which might be relevant, hopefully the OpenBMC people can weigh in on that.
What does OpenBMC need to change to support these differences? Do they have any gross hard-coded sizes/addresses? Pinging.. @shenki / @geissonator / etc?
There's probably no changes needed on the OpenBMC side. It currently saves each partition binary as a 'file', thus removing any 'empty' space in between partitions specified by the layout. This shrinks the current witherspoon pnor image from 64MB to ~22MB. Adding more partitions would just add more to the image size but would still be less than the flash chip.
I was referring to OpenBMC's "Virtual-PNOR" abtraction which holds everything in memory, which itself isn't an issue but the current implementation has a few shortcomings which might be relevant, hopefully the OpenBMC people can weigh in on that.
Which relevant shortcomings are you thinking about?
Which relevant shortcomings are you thinking about?
Turns out I'm not really sure - it's been a while since I've properly looked at the VPNOR code, it looks a lot more robust now. In fact I just built images for Romulus and Witherspoon with a rearraged PNOR layout (BOOTKERNEL moved to the end and resized, everything else shifted up), and it appears to boot successfully on both platforms. Potentially we're looking at no changes required anywhere then which is great news.
@sammj want to give it a go, splitting out BOOTKERNEL and initramfs? We could then look at the skiboot patches from Matt to decompress the initramfs during skiboot which should shave at least 2 seconds off boot time.
@dcrowell77 it's also possible that xz compression could be applied to WOFDATA and dramatically decrease the space used. Mind you, this is probably at the appropriate level far down on the TODO list :)
Yep, we are aware of the idea to compress WOFDATA, but that drives work on our side when we consume it so it hasn't made our priority list. We might need to do it for one of the systems but so far we're barely fitting...
I could always work on my old patch to do xz compressed HBRT partition and rework it to add some generic infrastructure in hostboot to consume xz compressed partitions. (https://github.com/open-power/hostboot/pull/66 ) - but I'm no fan of my code there, it's entirely made out of hacks
Had a play with splitting kernel and initramfs - haven't done any timing yet but it does require this fixup to skiboot: http://patchwork.ozlabs.org/patch/887586/