[DNM] schedule: add Tasks With Budget scheduler type
Tasks With Budget Scheduler
TWB scheduler is a scheduler that creates a separate preemptible Zephyr thread for each SOF task that has pre-allocated MCPS budget renewed with every system tick. When the task is ready to run, then depending on the budget left in the current system tick, either MEDIUM_PRIORITY or LOW_PRIORITY is assigned to task thread. The latter allows for opportunistic execution if there is no other ready task with a higher priority while the budget is already spent.
Examples of tasks with budget: Ipc Task, Idc Task.
Task with Budget (TWB) has two key parameters assigned:
- cycles granted: the budget per system tick,
- cycles consumed: number of cycles consumed in a given system_tick for task execution
The number of cycles consumed is being reset to 0 at the beginning of each system_tick, renewing TWB budget. When the number of cycles consumed exceed cycles granted, the task is switched from MEDIUM to LOW priority. When the task with budget thread is created the MPP Scheduling is responsible to set thread time slice equal to task budget along with setting callback on time slice timeout. Thread time slicing guarantee that Zephyr scheduler will interrupt execution when the budget is spent, so MPP Scheduling timeout callback can re-evaluate task priority.
If there is a budget left in some system tick (task spent less time or started executing close to the system tick that preempts execution), it is reset and not carried over to the next tick.
Details: https://thesofproject.github.io/latest/architectures/firmware/sof-zephyr/mpp_layer/mpp_scheduling.html
Zephyr required patch:
- [x] https://github.com/zephyrproject-rtos/zephyr/pull/71855
Todo:
- remove fixed numbers
- switch IPC task to TWB
Examples of tasks with budget: Ipc Task, Idc Task.
@abonislawski can you explain how one would set a "budget" for things that are dependent on the host interaction. IPC are notoriously difficult to plan. IIRC in past solutions the "budget" was an arbitrary 10% or something allocated to IPC, because we couldn't figure things in more details.
And second, where is this budget stored? In the host driver, topology, something else?
@plbossart my understanding is this is "general budget" for all IPC's, not for a specific IPC (like create pipeline), let's say we don't want to block other tasks (like DP) for more than 10% of systick time so there is no need to calculate it in great detail. "Out of budget" task is still running, the only thing changing is thread priority for the rest of systick time, so it's more like warranty for other tasks and their opportunity to run.
And second, where is this budget stored? In the host driver, topology, something else?
I didn't switched IPC task to TWB yet but based on the above (single general budget) Im expecting single define somewhere in FW in the first version
@mwasko please comment from the architecture perspective if my understanding is wrong
@plbossart my understanding is this is "general budget" for all IPC's, not for a specific IPC (like create pipeline), let's say we don't want to block other tasks (like DP) for more than 10% of systick time so there is no need to calculate it in great detail. "Out of budget" task is still running, the only thing changing is thread priority for the rest of systick time, so it's more like warranty for other tasks and their opportunity to run.
And second, where is this budget stored? In the host driver, topology, something else?
I didn't switched IPC task to TWB yet but based on the above (single general budget) Im expecting single define somewhere in FW in the first version
@mwasko please comment from the architecture perspective if my understanding is wrong
That is correct. Today in SOF we must decide if IPCs have higher priority then data processing modules and risk that sequence of host IPCs will cause audio glitches or the other way around and risk IPC timeouts due to heavy DSP load. Task with Budget plays a role of a guard that guarantee in each processing cycle a budget of MCPS that can be used for IPC processing in high priority and if it is exceeded the priority is lowered to let the critical audio processing to complete.
Pushed small update with:
- thread priorities (ll, twb, dp, edf) organized in kconfig.threads_prio
- added check in schedule_task to determine if it needs to start thread first time or just resume it
@lgirdwood I will switch IPC or IDC task to TWB in this PR so it should be heavily tested in current CI
There is also one timeslice change required in Zephyr, already merged here (waiting for west update):
- [x] https://github.com/zephyrproject-rtos/zephyr/pull/71855
@lgirdwood I will switch IPC or IDC task to TWB in this PR so it should be heavily tested in current CI
There is also one timeslice change required in Zephyr, already merged here (waiting for west update):
Ok, IPC should be a good test. One thing to check for IDC is that this change wont impact timing around Zephyr mutex() APIs that use IDC on multicore configs.
CI failed on MTL because it is using debug build and assert failed:
ASSERTION FAIL [!arch_is_in_isr()] @ ZEPHYR_BASE/kernel/thread.c:662
Threads may not be created in ISRs
@nashif how hard is this limitation? Because this code actually works in normal build (without asserts)
For debug build problems:
Zephyr fix provided to unblock priority setting in ISR: https://github.com/zephyrproject-rtos/zephyr/pull/76522
The remaining problem for Threads may not be created in ISRs is creating LL task which creates another thread in ISR and actually this LL task triggers the assert.
For debug build problems:
Zephyr fix provided to unblock priority setting in ISR: zephyrproject-rtos/zephyr#76522
The remaining problem for Threads may not be created in ISRs is creating LL task which creates another thread in ISR and actually this LL task triggers the assert.
@abonislawski did you get a chance to resolve the issue ?
Removed hardcoded numbers + new kconfig CONFIG_TWB_IPC_TASK to select scheduler type for IPC so we can now merge TWB scheduler with base functionality
@abonislawski I think we are almost there, one other thing that may help (due to current CI stability) would be to enable the TWB IPC on MTL and LNL too, this way we will get wider coverage in the CI.
@abonislawski can you check internal CI. Thanks !
@lgirdwood timeout on one platform, waiting for rerun
@abonislawski green now - other results showing known and unrelated failures.