soundwire: stream: Prepare ports in parallel to reduce stream start l…
…atency
Issue DP prepare to all ports that use full CP_SM. Then wait for the prepare to complete. This allow all the DP to prepare in parallel to reduce the latency of starting an audio stream.
On a system with six CS35L56 amps, this reduces the startup latency, from runtime_resume to all amps ready to play, from ~160 ms to ~60 ms.
(Test hardware: UpXtreme i14, BIOS v1.2, Core Ultra 7 155H, 3x CS35L56 on link 0, 3x CS35L56 on link 1).
An initial read of DPn_PREPARESTATUS is done before dropping into the wait, so that a quick exit can be made if the port is already prepared. Currently this is essential because the wait deadlocks - the stream setup takes bus_lock, which blocks the interrupt handler - so the wait for completion will always timeout.
However, an experiment of removing the bus_lock from stream setup, so that the interrupt will work, shows that wait for completion takes ~700..800 us but the quick-exit read takes 50..200 us. So the quick exit is still valuable even if the stream.c code was rewritten to allow the completion interrupt to work. Rewriting the code so it doesn't take bus_lock is risky. The deadlock only lasts until the wait times out so it's not a serious problem now that the DP prepare happens in parallel.