sof Introduce preempt domain on Zephyr

This is a very rough attempt to implement the DMA domain on Zephyr. Most of it is inspired from dma_multi_chan_domain.

Initially, I tried using the same approach as zephyr_domain.c as in having a Zephyr thread run for each core and said threads would do their assigned work if the DMA IRQs would give them semaphore resources but this approach has the following limitations:

Because of how task canceling is done ( zephyr_ll_task_cancel() sets the task's state to CANCELED and zephyr_ll_run() makes sure to call domain_unregister() if task as canceled) this leads to the same problem described in second to last patch ( schedule: zephyr_ll: Call zephyr_ll_task_done on zephyr_ll_task_cancel ).
Using aforementioned patch with the thread approach would probably not be very good as that would probably mean (I am not 100% sure) calling k_thread_abort() from IRQ context.

In order to avoid the second limitation (first one still remains, sadly) I chose an approach similar to dma_multi_chan_domain as in zephyr_ll_run will be executed from DMA IRQ context.

I noticed that domain_is_pending() was entirely removed from zephyr_ll.c. Without it, I believe DMA domain will not be able to work properly (because, in the case of DMA domain, not all tasks should be executed whenever we receive a DMA IRQ so domain_is_pending() helps out with that). If you think this solution is viable I think the next step would be to bring back domain_is_pending().

There's also the fact that zephyr_domain's unregister() function will call k_thread_abort() from IRQ context (because of this patch series) which is the same problem as DMA domain approach with Zephyr threads. I am curious as to why was the threads approach chosen over the IRQ approach for timer domain.

Currently, I've tested this solution using imx-stable-v2.2 branch and it seems to work fine. (tested doing consecutive aplays) Will also have to test using main branch.

For now, only the case where we have 1 PPL task at a time will work.

Sep 22 '22 13:09 LaurentiuM1234

Thanks @LaurentiuM1234 good to see this making progress :)

Because of how task canceling is done ( zephyr_ll_task_cancel() sets the task's state to CANCELED and zephyr_ll_run() makes sure to call domain_unregister() if task as canceled) this leads to the same problem described in second to last patch (

IIRC, for LL tasks we set the cancel state and this is then checked at next LL thread run. I would imagine we could do the same here, i.e.

thread is processing
IPC comes in to STOP
IPC sets pipeline state
thread processing complete, logic then check state and cleans up pipeline and stops thread.

schedule: zephyr_ll: Call zephyr_ll_task_done on zephyr_ll_task_cancel ). 2. Using aforementioned patch with the thread approach would probably not be very good as that would probably mean (I am not 100% sure) calling k_thread_abort() from IRQ context.

In order to avoid the second limitation (first one still remains, sadly) I chose an approach similar to dma_multi_chan_domain as in zephyr_ll_run will be executed from DMA IRQ context.

We should be able to use the same mechanism that we use to start the LL thread for timer based triggering.

Sep 26 '22 11:09 lgirdwood

Thanks @LaurentiuM1234 good to see this making progress :)

Because of how task canceling is done ( zephyr_ll_task_cancel() sets the task's state to CANCELED and zephyr_ll_run() makes sure to call domain_unregister() if task as canceled) this leads to the same problem described in second to last patch (

IIRC, for LL tasks we set the cancel state and this is then checked at next LL thread run. I would imagine we could do the same here, i.e.

thread is processing

IPC comes in to STOP

IPC sets pipeline state

thread processing complete, logic then check state and cleans up pipeline and stops thread.

schedule: zephyr_ll: Call zephyr_ll_task_done on zephyr_ll_task_cancel ). 2. Using aforementioned patch with the thread approach would probably not be very good as that would probably mean (I am not 100% sure) calling k_thread_abort() from IRQ context. In order to avoid the second limitation (first one still remains, sadly) I chose an approach similar to dma_multi_chan_domain as in zephyr_ll_run will be executed from DMA IRQ context.

We should be able to use the same mechanism that we use to start the LL thread for timer based triggering.

@lgirdwood The problem is that when the IPC comes with the TRIGGER STOP the SAI stops and so does the DMA IRQ so there's no more IRQs to give resources to the semaphore in the Zephyr thread to actually do its work. Because of this, the DMA IRQ (from EDMA's POV) never gets unregistered (since the unregistering process is done in the thread's work function). I will try to see if I can find a way to overcome this.

Also, I believe we should reintroduce domain_is_pending(). As of now, whenever we get a DMA IRQ all the tasks are executed which is wrong. This will allow us to only execute the tasks which are associated with the DMA IRQ that came. Let me know what you think.

Sep 28 '22 08:09 LaurentiuM1234

@lgirdwood The problem is that when the IPC comes with the TRIGGER STOP the SAI stops and so does the DMA IRQ so there's no more IRQs to give resources to the semaphore in the Zephyr thread to actually do its work. Because of this, the DMA IRQ (from EDMA's POV) never gets unregistered (since the unregistering process is done in the thread's work function). I will try to see if I can find a way to overcome this.

That's a good point - we have to assume that the last DMA IRQ can occur after the the DMA is stopped/drained and always after the STOP IPC. We probably need some logic to check in the task status in the domain IRQ handler.

Sep 28 '22 14:09 lgirdwood

V2 changes:

Switched from the IRQ-based approach to the thread-based approach.
Removed the commits that were integrating the DMA domain. I believe that, for now, the main focus of this PR should be on the commits that implement the DMA domain. Once the solution becomes more viable, I will reintroduce the other commits, also addressing @lyakh 's suggestions.

Problems

When trying to do 2 consecutive aplay the second one doesn't work properly, resulting in an input/output error. I did some digging around and it seems like after a sai_stop()/ operation I need to do interrupt_disable() and interrupt_unregister() before the next sai_start() operation. As of now, I couldn't find an explanation for this behavior.
In the aforementioned case, if I do a zephyr_ll_task_done() operation inside a zephyr_ll_task_cancel() operation it seems like that fixes the issue. The problem is, the DMA trace acts somewhat weird. When the trigger STOP operation comes, the DMA trace should normally print some logs (e.g: edma_stop, sai_stop) immediately. It seems like, these logs get printed only when the second aplay comes around.

Notes

The current solution was built using imx-stable-v2.2 so it will probably not compile on main. Will make sure to fix that soon.

Sep 29 '22 13:09 LaurentiuM1234

@LaurentiuM1234 might be worth adding a user to an Intel nocodec topology2 as a subsequent PR. This way it could be tested within the Intel CI as we have no way of testing in CT atm ?

Will try to have a look a bit later. As of now, it seems like the solution is not very stable anyways.

Oct 04 '22 11:10 LaurentiuM1234

@LaurentiuM1234 might be worth adding a user to an Intel nocodec topology2 as a subsequent PR. This way it could be tested within the Intel CI as we have no way of testing in CT atm ?

Will try to have a look a bit later. As of now, it seems like the solution is not very stable anyways.

It sounds like this would be good as an staging or experimental feature once you are happier with maturity? That way we could get more developers to test and eliminate the issues. Btw, what issues are you seeing ? @kv2019i fyi

Oct 04 '22 13:10 lgirdwood

It sounds like this would be good as an staging or experimental feature once you are happier with maturity? That way we could get more developers to test and eliminate the issues. Btw, what issues are you seeing ? @kv2019i fyi

That's a very hard one. After stress-testing ~ 30 mins, with aplay/arecord the firmware looks like to be crashing.

Oct 04 '22 14:10 dbaluta

It sounds like this would be good as an staging or experimental feature once you are happier with maturity? That way we could get more developers to test and eliminate the issues. Btw, what issues are you seeing ? @kv2019i fyi

That's a very hard one. After stress-testing ~ 30 mins, with aplay/arecord the firmware looks like to be crashing.

So we are all productizing SOF on Zephyr on xtensa and this I think would be a good stress test (under a experimental/staging) Kconfig.

This way it can be shared with more SOF and Zephyr developers as a scheduling test case and also be stressed by CI. This will help us identify any lingering core logic issues.

I'm keen to merge, providing the Kconfig has the flag and the warning. I'd like to help and see this working well.

Oct 05 '22 12:10 lgirdwood

@LaurentiuM1234 this looks good to me. We can merge it so that other also get a chance to test it.

Oct 12 '22 11:10 dbaluta

V3 changes:

Previously, because of the fact that DMA IRQs would get cut off whenever a TRIGGER STOP would come the preempt domain thread would stop doing its assigned work. Because of the fact that zephyr_ll's zephyr_ll_task_cancel() would not remove the task from the task queue whenever it was called that lead to DMA IRQs remaining masked (because of the fact that SAI's set_config() function would mask them every time it was called). In order to overcome this, I introduced a new domain function domain_task_cancel(). The purpose of this function is to let the preempt domain know the number of remaining tasks when zephyr_ll_task_cancel() is called. Whenever that count reaches 0, the domain will know that the DMA IRQs got cut off. In order to avoid aforementioned problem, domain_task_cancel() gives a resource to the thread semaphore in order to allow it to execute its assigned work one more time after the DMA IRQs get cut off. This will allow the thread to remove the task from the task queue and call domain_unregister().
Solved

In the aforementioned case, if I do a zephyr_ll_task_done() operation inside a zephyr_ll_task_cancel() operation it seems like that fixes the issue. The problem is, the DMA trace acts somewhat weird. When the trigger STOP operation comes, the DMA trace should normally print some logs (e.g: edma_stop, sai_stop) immediately. It seems like, these logs get printed only when the second aplay comes around.

by simply switching to the main branch to do the testing. It turns out that the problem was caused by the fact that the queued IPCs were sent whenever zephyr_ll_task_run() was executed. Due to the fact that the DMA IRQs got cut off, the queued IPCs would not be sent until the DMA IRQs would get enabled again.

Preempt domain is now added as an experimental feature. This means that the developers can simply turn on CONFIG_PREEMPT_DOMAIN in order to test it out. Until this becomes more stable, I believe this might be the way to go.

Problems:

Currently there seems to be something weird going on whenever I enable DMA tracing with preempt domain. This will require further investigations. For now, it seems that if the DMA tracing is disabled the current version passes the stress testing (which is just a series of aplay's and arecords's having a 3s sleep in between them). The problem consists of the fact that the kernel doesn't get a response for some IPCs so it gets a timeout.

TODOs:

Add domain_is_pending() to zephyr_ll and come up with a decent implementation of it inside preempt domain. (The current one will probably not work for mixer topologies for example)
Figure out why Problem 1 occurs.
Make sure that all Zephyr threads started by preempt domain are aborted at some point.

Oct 12 '22 11:10 LaurentiuM1234

@LaurentiuM1234 this looks good to me. We can merge it so that other also get a chance to test it.

sounds good to me.

sorry, closed PR by mistake.

Oct 12 '22 11:10 LaurentiuM1234

V4 changes:

Solved problem with the order of the commits pointed out by @kv2019i. Now the introduction of domain_task_cancel() comes before its use in preempt domain.
Removed domain_is_pending() and all its logic since it's not being used right now. Will reintroduce this later on.

Oct 18 '22 08:10 LaurentiuM1234

V4 changes:

Solved problem with the order of the commits pointed out by @kv2019i. Now the introduction of domain_task_cancel() comes before its use in preempt domain.

Removed domain_is_pending() and all its logic since it's not being used right now. Will reintroduce this later on.

@LaurentiuM1234 did you fix the stability, fwiw a critical Zephyr (xtensa?) schedule/timer fix was merged recently that may help. @kv2019i will @LaurentiuM1234 need the fix ?

Oct 24 '22 15:10 lgirdwood

@LaurentiuM1234 did you fix the stability, fwiw a critical Zephyr (xtensa?) schedule/timer fix was merged recently that may help. @kv2019i will @LaurentiuM1234 need the fix ?

It is very strange what is happening. with the stability test.

Without logging activated (e.g tracing work scheduled) everything works fine for 12 hrs.

With logging activated at some point it looks like an IPC gets a reply, but the reply header is wrong.

@lgirdwood @kv2019i care to point us to the fix?

Oct 24 '22 17:10 dbaluta

@lgirdwood @dbaluta Just did a test with the latest changes brought to main branch and the IPC problem is still there

Oct 25 '22 07:10 LaurentiuM1234

@dbaluta @LaurentiuM1234 we have a similar IPC corruption issue in IPC4 mode. We suspect the host or another module may be overwriting the mailbox during processing (i.e. logging, trace or stream status). i.e. we may need to either lock mailbox IO or copy the mailbox data during IPC processing and copy back for IPC reply. @lyakh fyi

Oct 26 '22 13:10 lgirdwood

With logging activated at some point it looks like an IPC gets a reply, but the reply header is wrong.

@dbaluta @LaurentiuM1234 Btw, what header do you get and what do you expect ? Could help in solving. How long is playback before the problem occurs.

One more thing, what happens if verbose logging is enabled ? does the reproduction rate change. i.e. can we make it easy to reproduce. If easy to repor, it may be worthwhile to change an Intel topology in this PR to test the code and see what teh result is on Intel (as this could also shed more light on the root cause).

Oct 26 '22 13:10 lgirdwood

Btw, what header do you get and what do you expect ? Could help in solving. How long is playback before the problem occurs.

It seems as if the host receives the exact amount of bytes it's trying to send to the DSP. Here's a sample of the error message:

reply size (100) exceeds the buffer size (20)
ipc tx error for 0x30010000 (msg/reply size: 100/20): -22
Failed to setup widget PGA2.0
Failed to restore pipeline after resume -22
ASoC: error at snd_soc_pcm_component_pm_runtime_get

One more thing, what happens if verbose logging is enabled ? does the reproduction rate change. i.e. can we make it easy to reproduce.

Just did a test and it doesn't seem to change anything in particular. The problem seems to occur just as fast as in the case without verbose tracing. The max time I was able to run the test without the problem occurring was around 20 mins if I remember correctly? Usually it happens really fast (e.g: it can even happen in the beginning when we schedule the capture and playback tasks together)

It seems to me that the problem pointed out by @dbaluta is not the only one. I've also had some runs in which the IPCs would get a timeout (even with 10.000 ms timeout in the kernel) and when I inspected the logs it seemed to me as if they were frozen. The steps would be the following:

Start the sof-logger in the beginning, while the playback and capture tasks are still running.
Get a kernel error saying some IPC timed out.
Get the following error from sof-logger and crash: error: in logger_read(), fread(..., /sys/kernel/debug/sof/trace) failed: Operation not permitted(-1)
Restart sof-logger => the logger would show the logs printed by the timer Zephyr thread from zephyr_domain

The problem seems to occur only after a FW reboot so if I were to run a bunch of aplays without allowing the FW to suspend (because of it being inactive for 2s) the problem wouldn't occur at all.

Also, usually, the problem only happens in the beginning or in the end whenever I do an aplay after FW gets suspended (i.e: around the time the buffers get connected/disconnected but it can occur even sooner than the buffer connection IPCs)

If it helps, the test script I used has the following pseudocode:

for i = 1, 1000 do
    for j = 1, 10 do
        aplay -d 2 sample_playback.wav
        sleep 3s
    done

    for j = 1, 10 do
        aplay -d 2 sample_playback.wav
        arecord -d 2 sample_capture.wav
        sleep 3s
    done

    for j = 1, 10 do
        arecord -d 2 sample_capture.wav
        sleep 3s
    done
done

If easy to repor, it may be worthwhile to change an Intel topology in this PR to test the code and see what teh result is on Intel (as this could also shed more light on the root cause).

If the aforementioned problems don't occur on Intel's side then I'm somewhat reluctant to add a new topology as it probably wouldn't be of much help.

I think the IPC problem might not be related to the introduction of the preempt domain (I'm not sure if this is relevant but I also did a run in which I commented out the line from platform.c which added the zephyr_ll scheduler with type SOF_SCHEDULE_LL_DMA to the scheduler list thus not making use of the preempt domain and the aforementioned problems still occurred). If it's alright with you @dbaluta @lgirdwood @lyakh @kv2019i I believe we can merge this PR for now as it seems to work okay without enabling the DMA trace.

EDIT: Regarding this statement:

I think the IPC problem might not be related to the introduction of the preempt domain

I've also noticed that the problems never occur when the pipeline task gets scheduled. It's either before or after (usually before)

Oct 26 '22 14:10 LaurentiuM1234

Btw, what header do you get and what do you expect ? Could help in solving. How long is playback before the problem occurs.

It seems as if the host receives the exact amount of bytes it's trying to send to the DSP. Here's a sample of the error message:
reply size (100) exceeds the buffer size (20)
ipc tx error for 0x30010000 (msg/reply size: 100/20): -22
Failed to setup widget PGA2.0
Failed to restore pipeline after resume -22
ASoC: error at snd_soc_pcm_component_pm_runtime_get

oh, what is the use case. I'm confused by why this error would occur during playback ?

Agree - can be merged.

Oct 26 '22 16:10 lgirdwood

oh, what is the use case. I'm confused by why this error would occur during playback ?

It probably happened in the beginning when the playback and capture pipeline tasks are scheduled or during an arecord done after a FW resume.

I'm very sorry, for some reason I only included a portion of the pseudocode of the test script. The script (pseudocode) actually looks like this:

for i = 1, 1000 do
    for j = 1, 10 do
        aplay -d 2 sample_playback.wav
        sleep 3s
    done

    for j = 1, 10 do
        aplay -d 2 sample_playback.wav
        arecord -d 2 sample_capture.wav
        sleep 3s
    done

    for j = 1, 10 do
        arecord -d 2 sample_capture.wav
        sleep 3s
    done
done

Will also edit the initial comment to include the full pseudocode.

Oct 26 '22 16:10 LaurentiuM1234

oh, what is the use case. I'm confused by why this error would occur during playback ?

When playback starts (after SOF device entered pm runtime suspend) ALSA needs to pm runtime resume the device.

So,

sof_resume
 ->    ret = tplg_ops->set_up_all_pipelines(sdev, false);
       -> sof_ipc3_set_up_all_pipelines
           -> sof_ipc3_widget_setup
                ->  hdr.cmd = SOF_IPC_GLB_TPLG_MSG | SOF_IPC_TPLG_COMP_NEW; 
                 -> hdr.size = sizeof(volume->config); // this size is 100
               -> sof_ipc_tx_message(sdev->ipc, swidget->private, hdr->size, &reply, sizeof(reply))

Now, when te reply arrives one would expect reply->hdr.size to be sizeof(reply). But for some reason, the message read from input mailbox looks to be not a reply but actually the original message sent.

So, the DSP sends an interrupt saying that it has put a reply mesage for the Host but actually doesn't modify the message.

It looks very strange and I'm trying to understand if there is some sort of caching issue (?) but it doesn't look so, or the DSP actually triggers an unwanted interrupt which the host identifies as a reply.

Oct 26 '22 16:10 dbaluta

If it's alright with you @dbaluta @lgirdwood @lyakh @kv2019i I believe we can merge this PR for now as it seems to work okay without enabling the DMA trace.

@LaurentiuM1234 @lgirdwood sorry I still don't understand why this is now called "preempt domain," isn't it doing the same as what the DMA domain is doing with XTOS?

Oct 27 '22 07:10 lyakh

@LaurentiuM1234 @lgirdwood sorry I still don't understand why this is now called "preempt domain," isn't it doing the same as what the DMA domain is doing with XTOS?

You are right. For now, preempt domain works just as the DMA domain in XTOS.

My current understanding is that, thanks to the fact that preempt domain is pretty generic (as in it it registers a DMA interrupt (which can be any interrupt if we generalize the function that does the interrupt - here, register_dma_irq) creates a thread and then based on that interrupt the thread does its assigned work) we can also use it with multiple interrupts, not just the ones from the DMA.

As @lgirdwood pointed out, we can also use it with timers (although I for one am not sure what's the difference between using zephyr_domain and zephyr_preempt_domain with a timer)

Oct 27 '22 08:10 LaurentiuM1234

. If easy to repor, it may be worthwhile to change an Intel topology in this PR to test the code and see what teh result is on Intel (as this could also shed more light on the root cause).

@lgirdwood what Intel topology would be the most fitted for us to try to enable preempt domain on? There are dozens of them :) and we don't know which one is easier for you to use.

Oct 27 '22 08:10 dbaluta

V5 changes

Changed name of the Zephyr thread to "pdomain_thread". This should give it a more generic name and remove the LL naming.
Changed ZEPHYR_LL_STACK_SIZE to ZEPHYR_PDOMAIN_STACK_SIZE.
Changed order of operations inside the DMA IRQ handler. Reason:

Should we not do these two steps before the k_sem_give() so that interrupts can continue to be sent. e.g. if the thread gets stuck we can at least get the next IRQ, check the thread status and then print an error and/or take other action.

Oct 27 '22 09:10 LaurentiuM1234

@LaurentiuM1234 I agree, that the PR looks rather good now, but I'm going to ask you about one more change, sorry. We discussed again with @lgirdwood and this is how I interpret the result of our discussion, I'd like to be corrected if I misrepresent anything. First of all - what we understand under "preemptive" pipeline tasks. The example that @lgirdwood used was

PCM0.pipeline_1 -> MP3 decoder -\
				 mixer -> DAI
PCM1.pipeline_2 -> volume-------/

where pipeline 2 is a normal non-preemptive pipeline, scheduled every 1ms, whereas pipeline 1 takes longer to process its mp3 blocks so it runs every 5ms. But while it's running it can be "preempted" - the execution will switch over to pipeline 2 and when it's done it will yield and execution will switch back to the mp3 decoder. I don't think such preemptive tasks can be reasonably implemented within the SOF LL scheduler. I think the way to implement them would be by using native Zephyr threads - one Zephyr thread per preemptive SOF pipeline task. This will make them rather expensive but I don't think we have a better option. Then those Zephyr threads will use priorities between 1 (highest) and CONFIG_NUM_PREEMPT_PRIORITIES (lowest). That means, that with your domain implementation you already cut into preemptive tasks:

	thread = k_thread_create(&dt->ll_thread,
		 zephyr_preempt_domain_stack[core],
		 ZEPHYR_PDOMAIN_STACK_SIZE,
		 zephyr_preempt_domain_thread_fn,
		 dt, NULL, NULL,
		 CONFIG_NUM_PREEMPT_PRIORITIES - 1,
		 0, K_FOREVER);

Which might be not what you want for normal short-running pipeline tasks. So we think that this PR should separate the DMA domain implementation and any code, related to the future preemptive scheduler. Most of the code in this PR I think actually implements a DMA domain, which effectively means I propose to rename all DMA-domain specific code back to use "dma" instead of "preempt." Any code that should be re-used between this PR and the future preempt scheduler can be extracted to separate files. However, I also think that if at this moment you cannot clearly identify which code will be re-used, it should be ok to just rename everything back to "dma" and later when we implement preemption we extract and re-use any code as appropriate.

Oct 27 '22 13:10 lyakh

@lyakh Makes sense. Will revert back to dma-based naming.

However, I also think that if at this moment you cannot clearly identify which code will be re-used, it should be ok to just rename everything back to "dma" and later when we implement preemption we extract and re-use any code as appropriate.

I believe the best thing to do right now is to wait until the preempt domain is actually introduced. This way, the PR will only introduce the DMA domain (as it's supposed to do).

Oct 31 '22 08:10 LaurentiuM1234

@lgirdwood @lyakh can you suggest an Intel platform that we can test this on? e.g give me the name of the topology and I will enable DMA domain on it and give it a test with CI.

With this we can check if you have the same problems as on i.MX.

Oct 31 '22 08:10 dbaluta

@LaurentiuM1234 please remove the 'draft' label from this PR. I think we can merge it as EXPERIMENTAL feature once we test it on some Intel platforms too.

Oct 31 '22 08:10 dbaluta

V6 changes

Changed back to dma-based naming.

Oct 31 '22 08:10 LaurentiuM1234