op-build
op-build copied to clipboard
Petitboot in 60 seconds
As an initial goal in reducing boot time, let's aim for power on to petitboot UI up in <60 seconds.
The aim is for this to be for a two socket system without tooo much memory.
The casual agreement has been to split this roughly equally between hostboot and OPAL for timing.
Last I heard it was taking many seconds after the poweron command for the SBE to even get poked by the BMC. Who's quota is that taken from? ;-)
Dan [email protected] writes:
Last I heard it was taking many seconds after the poweron command for the SBE to even get poked by the BMC. Who's quota is that taken from? ;-)
Only Hostboot and OPAL get measurable time, everyone else has to be close enough to 0 seconds :)
Or, we could have a boot time trading scheme, where we sell seconds of boot time to the highest bidder :)
-- Stewart Smith OPAL Architect, IBM.
Giving this a gentle bump. We're better than before but still quite some ways away from 60 seconds from the power button being hit (with the new ColdFire FSI driver, I'm going to recommend we include the BMC-side platform twiddling in the 60 seconds).
A quick eyeball look over the IPL process shows a few isteps that are candidates for optimization. Many of the early routines (mss_config etc.) take significant time, while others (centaur configuration) should be able to be bypassed entirely on at least the direct attach platforms.
Goals are great but they don't really get us anywhere... Some specific thoughts:
- The centaur steps you mention are essentially skipped, there is no centaur logic running other than a list through an empty list.
- Remember that until istep15, we are constrained to only 8MB of memory, this causes us to do a considerable amount of code swapping, which is bad because...
- Pnor access was the biggest piece of time the last time I heard. This actually got a little worse (I think) with the hiomap changes. There are some very interesting proposals to optimize some of the pnor usage patterns but none of them are simple.
- Before anyone suggests it, running istep logic in parallel can help some but in our experience we very quickly hit a wall where we actually get slower when we try to do too much parallelization due to memory thrashing.
- The boot time is very configuration dependent, can you post the console output from your box that shows the long isteps. I didn't think mss_eff_config was one of the longer ones.
Also this presentation from @stewart-ibm from linux.conf.au 2019
Dan [email protected] writes:
Goals are great but they don't really get us anywhere... Some specific thoughts:
- The centaur steps you mention are essentially skipped, there is no centaur logic running other than a list through an empty list.
- Remember that until istep15, we are constrained to only 8MB of memory, this causes us to do a considerable amount of code swapping, which is bad because...
- Pnor access was the biggest piece of time the last time I heard. This actually got a little worse (I think) with the hiomap changes. There are some very interesting proposals to optimize some of the pnor usage patterns but none of them are simple.
Yeah, it's pretty bad.
https://github.com/stewart-ibm/hostboot/commit/40214e00d98fc4ba45d5067a45f68652ad4fbf68 will address part of it, and gain around 6 seconds, and I need to write a subsequent patch that instead of reading whole 4k blocks from PNOR, only reads the size of the compressed page from PNOR, and that should save us a bunch more.
-- Stewart Smith OPAL Architect, IBM.