hostboot RFC: XZ compress HBRT image

The HBRT image is 3.4MB, which will take around 2 seconds to load from PNOR (if using the optimized read in skiboot). HostBoot loads the HBRT image in istep 21.1 (along with loading the payload and geting OCC going). If we xz compressed the HBRT image, it shrinks to 921kb, which is loaded (by the same fast skiboot code) in ~0.6 seconds.

Question: does hostboot need to load the hbrt image? Is there anything stopping us loading it in skiboot? Is there any initial setup done?.

before (run 1): 172.00640|ISTEP 21. 1 200.40760|htmgt|OCCs are now running in ACTIVE state (total: 28.4s)

before (run 2): 168.26122|ISTEP 21. 1 195.30265|htmgt|OCCs are now running in ACTIVE state (total: 27.0s)

after (run 1): 169.37556|ISTEP 21. 1 195.06400|htmgt|OCCs are now running in ACTIVE state (total: 25.68s)

after(run 2): 169.90019|ISTEP 21. 1 193.87845|htmgt|OCCs are now running in ACTIVE state (total: 23.98)

Saving: 1.3 - 4.4 seconds from boot. (average of 2.87s)

and that number seems in the correct ballpark considering the time pflash takes to read it.

More significantly, it frees up 2.4MB of PNOR.

This change is

Sep 15 '16 09:09 ghost

Question: does hostboot need to load the hbrt image? Is there anything stopping us loading it in skiboot? Is there any initial setup done?.

We load it and point at it in the devtree because that was what we were told to do originally by the OPAL team. Remember that we do this for both OpenPOWER and the enterprise systems. In the latter case, OPAL needs to request the data from the FSP as a lid since you don't have access to the PNOR. It would not be difficult to have divergent behavior on our side if OPAL can handle it though.

However, one more wrinkle in this story is Secureboot. Hostboot has to load HBRT during IPL because we are the only ones would can verify the signature. I think that forces us to keep the current design where HBRT is pushed into mainstore as part of the Hostboot IPL.

Sep 15 '16 15:09 dcrowell77

@e-liner - Can you take a look and pull this into gerrit if it looks okay?

Sep 15 '16 15:09 dcrowell77

@bofferdn, @e-liner - The secureboot code added an extra layer to the PNOR VFS. If we add additional sections to be XZ compressed, I wonder if we should try to integrate that feature directly into that new secureboot VFS layer. With this proposed patch from Stewart we end up with effectively duplicated code.

On the performance aspect, I thought we were going to attempt to overlap PAYLOAD and HBRT loading with OCC start. Did that ever happen?

Sep 15 '16 17:09 williamspatrick

No performance tweaks have been done to load pnor partitions in the background.

Sep 15 '16 17:09 dcrowell77

RE: Pushing decompression code into the VFS/PNOR code - I'm in total agreement with that idea as a better design. The question is always about resources and schedule. Putting this in as-is produces an immediate benefit (a few seconds) for very little work, while finding time to put the right solution in could take awhile.

Sep 15 '16 17:09 dcrowell77

This looks okay, but I'm going to want to test a few images on it. Plus we'll need an additional op-build change to compress the image if we're going this route. I'm running short on time this week, but I can try to look at it tomorrow.

Sep 15 '16 17:09 e-liner

Patrick Williams [email protected] writes:

@bofferdn, @e-liner - The secureboot code added an extra layer to the PNOR VFS. If we add additional sections to be XZ compressed, I wonder if we should try to integrate that feature directly into that new secureboot VFS layer. With this proposed patch from Stewart we end up with effectively duplicated code.

Hence the RFC, because that's totally what should happen rather than have inconsistent XZ calls scattered throughout hostboot.

This patch should totally not go in as-is, even though the error handling (which I totally removed) is eerily similar to the error handling already loading the HBRT image.

Ultimately, it looks like we don't even have to load the HBRT image until opal_prd starts (there's no way I can think of how to do this on P8 without breaking ABI though, but we could go that way for P9).

All of the partitions read in one go could be XZ compressed pretty easily and save a bunch of boot time.

On the performance aspect, I thought we were going to attempt to overlap PAYLOAD and HBRT loading with OCC start. Did that ever happen?

Tackling OCC start was the next thing to look at. ISTEP 21.1 was really constrained by PNOR access and OCC start time when I delved into it, with absolutely nothing else going on there except that, so everything there (except loading the PAYLOAD itself) is a candidate for the PAYLOAD to do in parallel with other tasks.

Stewart Smith OPAL Architect, IBM.

Sep 15 '16 23:09 ghost

Dan [email protected] writes:

RE: Pushing decompression code into the VFS/PNOR code - I'm in total agreement with that idea as a better design. The question is always about resources and schedule. Putting this in as-is produces an immediate benefit (a few seconds) for very little work, while finding time to put the right solution in could take awhile.

I'm happy to do that work, may need some assistance with a few pointers here and there.

After previous experiments, it seems like the best time to load the PAYLOAD is during bringing up DRAM, as that's easiest the longest time spent doing anything (especially with P8 and DDR4).. I know there's memory constraints on larger systems (Brazos), but we probably have a bit more leeway with the OpenPOWER kind of size... even if we only loaded a subset of the PAYLOAD that early.

I know that if I implemented that it'd make all the IPL flow documents out of date, but technical documentation that's up to date and current just seems.... not quite right :)

Stewart Smith OPAL Architect, IBM.

Sep 15 '16 23:09 ghost

e-liner [email protected] writes:

This looks okay, but I'm going to want to test a few images on it. Plus we'll need an additional op-build change to compress the image if we're going this route. I'm running short on time this week, but I can try to look at it tomorrow.

I think there's a few more partitions we could compress.

WINK and OCC come to mind, and they compress fairly well.

Stewart Smith OPAL Architect, IBM.

Sep 15 '16 23:09 ghost

Dan [email protected] writes:

Question: does hostboot need to load the hbrt image? Is there anything stopping us loading it in skiboot? Is there any initial setup done?.

We load it and point at it in the devtree because that was what we were told to do originally by the OPAL team. Remember that we do this for both OpenPOWER and the enterprise systems. In the latter case, OPAL needs to request the data from the FSP as a lid since you don't have access to the PNOR. It would not be difficult to have divergent behavior on our side if OPAL can handle it though.

It was certainly the easy way to get things going, no problem to revisit though :)

However, one more wrinkle in this story is Secureboot. Hostboot has to load HBRT during IPL because we are the only ones would can verify the signature. I think that forces us to keep the current design where HBRT is pushed into mainstore as part of the Hostboot IPL.

We have to verify signatures in skiboot, so this shouldn't be too much of an issue.

Stewart Smith OPAL Architect, IBM.

Sep 16 '16 00:09 ghost

The "other payload" requires Hostboot to load HBRT. Their secureboot support will be part of the HBRT image so they end up with a (chicken-egg) otherwise.

Sep 16 '16 02:09 williamspatrick

Patrick Williams [email protected] writes:

The "other payload" requires Hostboot to load HBRT. Their secureboot support will be part of the HBRT image so they end up with a (chicken-egg) otherwise.

Okay, I thought that was a possibility.

Seeing as there's existing logic based on the payload type, I guess that could be used.

Architecturally though, it does seem neater to have less knowledge of the payload implementation and fewer if (payload=blah)....

Stewart Smith OPAL Architect, IBM.

Sep 16 '16 02:09 ghost

I know there's memory constraints on larger systems (Brazos), but we probably have a bit more leeway with the OpenPOWER kind of size

Actually, we are significantly more memory constrained in the OP systems because we have a lot more code loaded. We are very close to the edge during the memory initialization steps, that is typically where we have seen OOM issues crop up. If you make any changes, be sure to test on a Garrison box with full memory, that has been our worst case scenario in the past.

Sep 16 '16 03:09 dcrowell77

The secureboot code added an extra layer to the PNOR VFS. If we add additional sections to be XZ ?compressed, I wonder if we should try to integrate that feature directly into that new secureboot VFS layer. With this proposed patch from Stewart we end up with effectively duplicated code.

the secure provider already verifies the payload which is compressed, it just doesn't do the decompression after the fact. I think what you are perhaps saying is .. after the verification, take the additional step of doing the decompression and expose the final, uncompressed image to the caller. In that case getSectionInfo would have to be smart enough to give the uncompressed final size vs. the normal size. Also right now the VA spaces for PNOR , temp, secure all mirror each other, but the secure space would have to be bigger to accommodate the larger section, so perhaps we'd have to migrate to a unique VA space per partition to allow for the expanded size. Could probably be done with a new shim VA layer in between temp + secure, so something like

compressed image in pnor
temp space reads in protected payload, does verification if secure boot enabled
if xz compressed, decompress into matching uncompressed VA space + free up all temp space pages
user access to secure space VA range pulls in corresponding page of uncompressed, and free uncompressed page. Now only secure space has that verified, uncompressed, pinned page

Would work great if partition is purely a read-only, protected payload. If we need to support unprotected payloads or hash-page-table verification of anything, that introduces more complexity. We're already mem constrained loading OCC early in the boot and we are prob. going to have to switch to hash page table verification for that, assuming we can't otherwise drive the memory footprint down (and it is pretty inefficient how it's being done).

There are also other modifications like making sure getSectionInfo for secure space is reporting the fully uncompressed size, etc.

And as mentioned, the plan was to load HBRT for P9 from Hostboot due to the chicken-and-egg issue of PHyp not having the verification code.

Sep 16 '16 04:09 bofferdn

Dan [email protected] writes:

I know there's memory constraints on larger systems (Brazos), but we probably have a bit more leeway with the OpenPOWER kind of size

Actually, we are significantly more memory constrained in the OP systems because we have a lot more code loaded. We are very close to the edge during the memory initialization steps, that is typically where we have seen OOM issues crop up. If you make any changes, be sure to test on a Garrison box with full memory, that has been our worst case scenario in the past.

Fascinating.

I think we max out at 512GB the garrisons in our lab, so i'll see how I go and ask for testing help/access if needed.

Stewart Smith OPAL Architect, IBM.

Sep 16 '16 04:09 ghost

Would work great if partition is purely a read-only, protected payload.

Even for this case, it only works to hide the decompression in the VMM if the partition is a 'load once and done' sort of thing, or we only access it after we have mainstore up. We likely don't have the cache memory available to keep everything resident, and we can't swap out single pages without a way to decompress an arbitrary piece out of the middle.

Sep 16 '16 13:09 dcrowell77

Dan [email protected] writes:

Would work great if partition is purely a read-only, protected payload.

Even for this case, it only works to hide the decompression in the VMM if the partition is a 'load once and done' sort of thing, or we only access it after we have mainstore up. We likely don't have the cache memory available to keep everything resident, and we can't swap out single pages without a way to decompress an arbitrary piece out of the middle.

Is there a HOWTO anywhere on looking at memory usage/maps while in cache?

Stewart Smith OPAL Architect, IBM.

Sep 19 '16 01:09 ghost

We don't have a huge amount of tooling, but what we do have uses the standard Hostboot debug framework, see src/build/debug/, the specific tools of interest are Hostboot/MemStats.pm and Hostboot/PageMgr.pm. Is that what you wanted or are you just curious about Hostboot memory usage in general?

Sep 19 '16 15:09 dcrowell77

hostboot hostboot copied to clipboard

RFC: XZ compress HBRT image

hostboot
hostboot copied to clipboard