sof icon indicating copy to clipboard operation
sof copied to clipboard

coherent: WIP: poison the uncache alias when object is locked.

Open lgirdwood opened this issue 2 years ago • 5 comments

Have a Kconfig option that will poison uncache alias and lock cache lines of a coherent object.

The intention is to

  1. crash any deref of the uncache alias when its locked
  2. detect any uncache changes made when locked.

We cannot yet check any reuse of local cache alias after release as it means keeping lines locked.

Signed-off-by: Liam Girdwood [email protected]

lgirdwood avatar Sep 12 '22 15:09 lgirdwood

@kv2019i @lyakh @mwasko @nashif @cujomalainey - all, thinking we can have some basic cache poisoning option on our platforms. This is a high level application level implementation (and would need a Kconfig), but would also be good to have a low level checker too.

This PR is WIP and it's missing some cache locking APIs for Zephyr support. @lyakh - heads up, I may need you to help finish this or to prove it does not work (as this is based on my understanding of the xtensa manual).

lgirdwood avatar Sep 12 '22 16:09 lgirdwood

Looks like it could be done in principle. This would only work on objects, actually shared between cores, marked as shared, of course. BTW, thinking about this, maybe another way to debug the coherent API would be to mark all such objects as shared to force the use of locking and cache synchronisation for each object on each access.

Locking has perf impact as we cant evict cache lines for other users. I suspect this Kconfig option will also slow down FW where we may not be able to reach high MCPS..

lgirdwood avatar Sep 13 '22 12:09 lgirdwood

@marcinszkudlinski fyi

mwasko avatar Sep 13 '22 15:09 mwasko

Looks like it could be done in principle. This would only work on objects, actually shared between cores, marked as shared, of course. BTW, thinking about this, maybe another way to debug the coherent API would be to mark all such objects as shared to force the use of locking and cache synchronisation for each object on each access.

Locking has perf impact as we cant evict cache lines for other users. I suspect this Kconfig option will also slow down FW where we may not be able to reach high MCPS..

@lgirdwood oh, of course, sorry, I should've mentioned - only as a debug option. As a debug option you'd be able to mark all coherent API objects as shared to check if that breaks anything.

lyakh avatar Sep 14 '22 06:09 lyakh

@lgirdwood oh, of course, sorry, I should've mentioned - only as a debug option. As a debug option you'd be able to mark all coherent API objects as shared to check if that breaks anything.

Absolutely - that could be an extra Kconfig debug. This would then blow up (and give us an assert()) if anyone attempted to use an unlocked cache object.

lgirdwood avatar Sep 14 '22 08:09 lgirdwood

I believe Zephyr is a better place for such operations and there one more useful poisoning to be considered"

  • poison all data in the cacheline except the specified area when invalidating without writeback

why: if there are couple of structures in a single cacheline and we want to invalidate cache for one of them, we may accidently loose data from the others. Extremely hard to debug as the loss does not occur every time. With posioning it does

marcinszkudlinski avatar Nov 17 '22 16:11 marcinszkudlinski

@marcinszkudlinski fwiw, the Zephyr team are working on a new cache API that we can use to do stuff like this. You may want to align with @nashif for details (and search for the upstream Zephyr cache PR to comment and review).

lgirdwood avatar Nov 18 '22 11:11 lgirdwood

Can one of the admins verify this patch?

gkbldcig avatar Feb 18 '23 04:02 gkbldcig