sof icon indicating copy to clipboard operation
sof copied to clipboard

zephyr_ll: update task->start time

Open lenghonglin opened this issue 2 years ago • 19 comments

task->start time should be updated by pipe period which is config with tplg.

If not , the real pipe_period will be very short (almost 30 us), maybe cause triger stop no reply

Signed-off-by: honglin leng [email protected]

lenghonglin avatar Jun 13 '22 06:06 lenghonglin

@lenghonglin can you give more context here, what was the flow where the issue occured. @lyakh @kv2019i fyi.

At first , the topology file is defined the PIPELINE_PCM_ADD which there is a value is describe the period

image

this value is config the sof pipe_period.

Then we can reference ll_schedule.c, we can see the task->start is add by next, the next is calculate with pdata->period, the pdata->period is come from topology.

But for zephyr_ll.c, we lose this part .

If not, the pipe_period is very short

image

the high level is pipe_copy time , the low level is pipe_period, we can see it only have 85us. It will cause the edf_workq cannot be schedule. The appearance is triger_stop have no reply.

lenghonglin avatar Jun 14 '22 02:06 lenghonglin

@lenghonglin @lgirdwood it is my understanding, that under Zephyr all pipeline tasks are currently scheduled with a 1ms period, no other period is supported

lyakh avatar Jun 14 '22 06:06 lyakh

@lenghonglin could you explain your oscilloscope capture? If I should believe the google translator it says, that the period there is 20ms

lyakh avatar Jun 14 '22 07:06 lyakh

@lenghonglin could you explain your oscilloscope capture? If I should believe the google translator it says, that the period there is 20ms

the 20ms mean high level (pipe_copy time) and low level(idle time). the idle time is config by topology, see ll_schedule.c about task->start this part.

lenghonglin avatar Jun 14 '22 07:06 lenghonglin

By the way, how should it be implemented if it is a long period? image

hongshui3000 avatar Jun 14 '22 12:06 hongshui3000

@lenghonglin are you using timer domain or DMA domain with Zephyr? Also, care to share your topology file.

dbaluta avatar Jun 14 '22 13:06 dbaluta

@lenghonglin @lgirdwood it is my understanding, that under Zephyr all pipeline tasks are currently scheduled with a 1ms period, no other period is supported

The LL timer tick is 1ms, but the LL thread can have work that is run every 2, 3 or other 1ms multiples.

lgirdwood avatar Jun 14 '22 15:06 lgirdwood

@lenghonglin could you explain your oscilloscope capture? If I should believe the google translator it says, that the period there is 20ms

the 20ms mean high level (pipe_copy time) and low level(idle time). the idle time is config by topology, see ll_schedule.c about task->start this part.

@lenghonglin so is there the 85 microsecond period visible in that capture or not?

lyakh avatar Jun 15 '22 07:06 lyakh

@lenghonglin are you using timer domain or DMA domain with Zephyr? Also, care to share your topology file.

SCHEDULE_TIME_DOMAIN_TIMER,

U can find the file which use SCHEDULE_TIME_DOMAIN_TIMER

image

like:

https://github.com/thesofproject/sof/blob/b4886bebbe49454850d59f1a49a0460e590db71c/tools/topology/topology1/sof-tgl-max98357a-rt5682.m4#L257

lenghonglin avatar Jun 15 '22 11:06 lenghonglin

@lenghonglin btw, are you using Zephyr tickless mode ? and what is the Zephyr tick Hz (still used for calculation when tickless). All the Zephyr timing granularity is based to the nearest Zephyr tick (and not HW timer cycle).

lgirdwood avatar Jun 15 '22 14:06 lgirdwood

@lenghonglin any update from your end ? Have you been able to find the 85uS delay ? Fwiw, this fix is breaking all the Zephyr platforms on CI. See https://sof-ci.01.org/sofpr/PR5912/build633/devicetest

lgirdwood avatar Jun 20 '22 14:06 lgirdwood

@lenghonglin any update from your end ? Have you been able to find the 85uS delay ? Fwiw, this fix is breaking all the Zephyr platforms on CI. See https://sof-ci.01.org/sofpr/PR5912/build633/devicetest

sry, a little busy these day, will deal later

lenghonglin avatar Jun 21 '22 05:06 lenghonglin

why need update task->start ?

At first, we can refer to ll_schedule.c. though the ll_schedule.c and zephyr_ll.c implementation is different, but the meaning is the same. It mean task->start calculation should stay the same . But , we can see this is different, For ll_schedule.c https://github.com/thesofproject/sof/blob/f58896fa146db525b6d33742319af2e302765083/src/schedule/ll_schedule.c#L144 For zephyr_ll.c

https://github.com/thesofproject/sof/blob/95d6499ad6b43d95a4ef6aef09b9d6d1671568d6/src/schedule/zephyr_ll.c#L256

It's weird.

second,from topology. https://github.com/thesofproject/sof/blob/71cd9666a95ccc801952a4a7d6f468458c7f8050/tools/topology/topology1/m4/pipeline.m4#L51

theperiod value will config to p->period

https://github.com/thesofproject/sof/blob/94e6c9969730901510dea4179f5283076507e009/src/audio/pipeline/pipeline-schedule.c#L227

https://github.com/thesofproject/sof/blob/94e6c9969730901510dea4179f5283076507e009/src/audio/pipeline/pipeline-schedule.c#L357

So , we can see the pdata->period in ll_schedule.c is the period in topology.

BUT for zephyr_ll.c, it do not use the vaule . Right ?

@lgirdwood @kv2019i

lenghonglin avatar Jun 28 '22 03:06 lenghonglin

BUT for zephyr_ll.c, it do not use the vaule . Right ?

@lenghonglin that's correct. For Zephyr the period is fixed, and there's also a comment, saying that .start and .next should be removed:

/*
 * struct task::start and struct ll_schedule_domain::next are parts of the
 * original LL scheduler design, they aren't needed in this Zephyr-only
 * implementation and will be removed after an initial upstreaming.
 */

lyakh avatar Jun 28 '22 06:06 lyakh

BUT for zephyr_ll.c, it do not use the vaule . Right ?

@lenghonglin that's correct. For Zephyr the period is fixed, and there's also a comment, saying that .start and .next should be removed:

/*
 * struct task::start and struct ll_schedule_domain::next are parts of the
 * original LL scheduler design, they aren't needed in this Zephyr-only
 * implementation and will be removed after an initial upstreaming.
 */

So, why remove this part?

lenghonglin avatar Jun 28 '22 06:06 lenghonglin

BUT for zephyr_ll.c, it do not use the vaule . Right ?

@lenghonglin that's correct. For Zephyr the period is fixed, and there's also a comment, saying that .start and .next should be removed:

/*
 * struct task::start and struct ll_schedule_domain::next are parts of the
 * original LL scheduler design, they aren't needed in this Zephyr-only
 * implementation and will be removed after an initial upstreaming.
 */

So, why remove this part?

@lenghonglin sorry, what do you mean? Why we want to remove it? Or why we haven't removed it yet?

lyakh avatar Jun 29 '22 06:06 lyakh

BUT for zephyr_ll.c, it do not use the vaule . Right ?

@lenghonglin that's correct. For Zephyr the period is fixed, and there's also a comment, saying that .start and .next should be removed:

/*
 * struct task::start and struct ll_schedule_domain::next are parts of the
 * original LL scheduler design, they aren't needed in this Zephyr-only
 * implementation and will be removed after an initial upstreaming.
 */

So, why remove this part?

@lenghonglin sorry, what do you mean? Why we want to remove it? Or why we haven't removed it yet?

see https://github.com/thesofproject/sof/issues/5966#issuecomment-1171208117.

lenghonglin avatar Jul 04 '22 06:07 lenghonglin

@lenghonglin there are some differences in how the LL scheduler works on xtos and Zephyr RTOS.

  1. xtos - the LL scheduler 1ms tick is NOT a true 1ms and can jitter a little to deal with tasks that occasionally spike. This mean extra time on tick n would mean less time on tick n+1, but n+2 should be back to normal. The tick and LL tasks run at the same IRQ context (meaning the next tick cannot preempt the existing tick tasks).
  2. Zephyr - LL scheduler timer ticks is always a synchronous 1ms from the last LL timer tick. The LL tasks are run in priority order after the LL tick and all tasks must finish before the next tick. The timer tick runs at IRQ context whilst the LL tasks run in thread context and can be preempted by the next tick (which checks for this).

We should never need to change next tick start with Zephyr, I would think your delay is caused by the Zephyr tickless mode HZ value performance issue that is still under investigation.

lgirdwood avatar Jul 06 '22 21:07 lgirdwood

Can one of the admins verify this patch?

gkbldcig avatar Aug 27 '22 09:08 gkbldcig

Converting to draft, no activity in 3 months.

kv2019i avatar Apr 24 '23 09:04 kv2019i