skiboot icon indicating copy to clipboard operation
skiboot copied to clipboard

OPAL_FLASH_READ can take 1003ms

Open ghost opened this issue 6 years ago • 13 comments

Spotted on a simple boot-to-OS test on Habanero, a OPAL_FLASH_READ call took 1003ms to complete.

My guess is this is opal-prd starting up and doing a bunch of flash things.

ghost avatar Dec 12 '18 05:12 ghost

Moving this to -> https://github.ibm.com/open-power/opal-project-mgmt/issues/61

I think this one can be closed and we can track under the opal-project-mgmt issue.

debmc avatar Apr 24 '19 03:04 debmc

Deb McLemore [email protected] writes:

Moving this to -> https://github.ibm.com/open-power/opal-project-mgmt/issues/61

I think this one can be closed and we can track under #61.

Tracking here is useful as it's public while our internal project management issues aren't.

-- Stewart Smith OPAL Architect, IBM.

ghost avatar Apr 29 '19 06:04 ghost

Tracking here

Ok, please assign to me (it would not allow me).

debmc avatar Apr 29 '19 12:04 debmc

Subject: libflash API: Add 'Prepare' call to allow asynchronous behaviour

We need to be able to perform asynchronous flash operations across the various backends.

Suggestions to look at: 1 - https://github.com/stewart-ibm/skiboot/commits/async-flash 2 - posix_fadvise(POSIX_FADV_WILLNEED)

My plan to approach this task:

1 - Learn the existing libflash and related code 2 - Understand the limitations of each backend 3 - Understand the async-flash patch for applicability 4 - Design/Code a prototype 5 - Deliver patch for review and comment

debmc avatar Apr 29 '19 12:04 debmc

@debmc any updates on this?

oohal avatar Jul 15 '19 05:07 oohal

updates

Working async tree -> https://github.com/debmc/skiboot/tree/async

Prototype testing is looking good, need to circle back and review with current prototype implementation if the goals are achieved across the backends.

debmc avatar Jul 15 '19 17:07 debmc

 hiomap: Enable Async IPMI messaging

To provide higher layer async operations to access
the target flash, enable hiomap to perform async
ipmi messaging (post boot).

Special considerations and constraints are to prevent
recursive locking and/or polling inside OPAL calls.

Some summary op-test Unit of Work Comparisons

Good Operations Faster
Bad Operations (test only) Penalty on waiting for timeout

START runTestReadEraseWriteNVRAM
STOP runTestReadEraseWriteNVRAM
  8.3 secs SYNC
  6.6 secs ASYNC

START runTestReadWritePAYLOAD
STOP runTestReadWritePAYLOAD
  1.87 secs SYNC
  - differences are timeouts on bad ops
  2.75 secs ASYNC

START runTestWriteTOC
STOP runTestWriteTOC
  1.55 secs SYNC
  - differences are timeouts on bad ops
  2.52 secs ASYNC

debmc avatar Jul 15 '19 17:07 debmc

Refreshed the async tree -> https://github.com/debmc/skiboot/tree/async

Tests look good, may be worth some review if anyone has some time.

Outstanding work is to circle back on error path testing.

debmc avatar Aug 02 '19 18:08 debmc

I had a quick look. There's a bunch of formatting and logging changes in the diff which bloats out the patch, I suggest separating this out as much as possible. Additionally a sizeable chunk of callback handling has been implemented alongside what already exists, and then there are various conditional blocks to ensure one path or another is taken. Maybe I just need to take the time to read the patch properly, but I guess I was missing the "why" on first skim. I think putting a summary of the approach in the commit message would be useful.

I was also a little concerned by the addition of "moving" to the window states - the window state is meant to reflect those described by the protocol, whereas this moving state is an implementation detail, so my gut instinct is we need to think about this a bit more. A similar concern was the introduction of state to track sequence ids of specific in-flight commands; the protocol only supports one active window and all operations against that window should to be serialised, so there should not currently be a case where multiple messages are in-flight. I suspect that needs more thought as well, or a convincing story to be written in comments or the commit message.

amboar avatar Aug 06 '19 04:08 amboar

Refreshed the async tree -> https://github.com/debmc/skiboot/tree/async

Tests look good, may be worth some review if anyone has some time.

Can you please post patches to mailing list? It becomes easy to review in mailing list.

-Vasant

hegdevasant avatar Aug 06 '19 05:08 hegdevasant

the protocol only supports one active window and all operations against that window should to be serialised, so there should not currently be a case where multiple messages are in-flight

I'll add some comments to the code as suggested, some overview here:

The tracking of sequence id's is for debug of ipmi messages in OPAL, so when tracing is turned on with log-level-memory and the BT Queue we can identify any queue handling or timing issues to map hiomap ipmi messages to other OPAL ipmi non-sync messages, etc. All operations on the window are serialized.

The "moving" is an internal state, however its needed to help control the state transition since we are now waiting on the move ('ll add some comments to the code as well).

The callback handling is needing to be kicked sometimes by sync messages (during boot) and post boot by async messages, therefore we need to conditionally identify when each method type would be called at the specific point in time.

debmc avatar Aug 06 '19 12:08 debmc

Refreshed the async tree -> https://github.com/debmc/skiboot/tree/async

Still need to do some more testing and regression testing.

debmc avatar Sep 07 '19 03:09 debmc

Sent patch series to skiboot mailing list:

[Skiboot] [PATCH 00/12] ipmi-hiomap: Enablement for Async opal_flash_op's https://lists.ozlabs.org/pipermail/skiboot/2019-October/015624.html

debmc avatar Oct 17 '19 22:10 debmc