sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] [IIR] IIR not working with SOF compiled with XCC RJ2024.3

Open iuliana-prodan opened this issue 1 year ago • 9 comments

Describe the bug Compiled SOF successfully with XCC toolchain RJ2024.3 from Cadence. When trying to run an IIR test I get a load/store error on DSP:

*** Booting Zephyr OS build v3.7.0-rc2-451-g740d7f735e23 ***
[00:00:00.000,000] <inf> main: SOF on imx8mp_evk
[00:00:00.000,000] <inf> main: SOF initialized
[00:00:06.788,000] <err> os:  ** FATAL EXCEPTION
[00:00:06.788,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:06.788,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:06.788,000] <err> os:  **  PS 0x60720
[00:00:06.788,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:06.788,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:06.788,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:06.788,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:06.788,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:06.788,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:06.788,000] <err> os:  ** SAR 0x1d
[00:00:06.788,000] <err> os:  **  THREADPTR 0x2
[00:00:06.788,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:06.788,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:06.886,000] <err> zephyr: Halting system

In Linux I get an Input/output error:

root@imx8mpevk:~# aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
[1] 784
root@imx8mpevk:~# sleep 2
Playing WAVE '10s_sample.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo

root@imx8mpevk:~# /unit_tests/sof/tools/ctl/sof-ctl -Dhw:0 -n 46 -s /unit_tests/sof/tools/ctl/eq_iir_bandpass.txt
Control size is 1024.
Applying configuration "/unit_tests/sof/tools/ctl/eq_iir_bandpass.txt" into device hw:0 control numid=46.
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
Success.
hdr: magic 0x00464f53
hdr: type 0
hdr: size 116 bytes
hdr: abi 3:0:0
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
root@imx8mpevk:~# 
root@imx8mpevk:~# aplay: pcm_write:2178: write error: Input/output error

[1]+  Done(1)                 aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav
root@imx8mpevk:~#

To Reproduce Use an IIR topology and run the following commands:

$ aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
$ sleep 2
$ sof-ctl -Dhw:0 -n 46 -s eq_iir_loudness.txt

Reproduction Rate With latest SOF - main branch, with XCC RJ2024.3, the error is met 100%. This was reproduced on all i.MX platforms: i.MX8MP, i.MX8QM, i.MX8QXP.

With XCC RI2023.11 everything works fine.

Expected behavior No error should appear.

Impact Not working as expected. Showstopper for IIR, if updating to latest XCC toolchain.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: Linux version 6.6.36
    • SOF: SOF/main
  2. Name of the topology file
    • Topology: sof-imx8mp-eq-iir-wm8960.tplg (or any IIR tplg for i.MX)
  3. Name of the platform(s) on which the bug is observed.
    • Platform: i.MX8MP, i.MX8QM, i.MX8QXP.

Screenshots or console output

[00:00:02.074,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.074,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.074,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.074,000] <err> os:  **  PS 0x60720
[00:00:02.074,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.074,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.074,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.074,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.074,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.074,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.074,000] <err> os:  ** SAR 0x1d
[00:00:02.074,000] <err> os:  **  THREADPTR 0x2
[00:00:02.074,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.074,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.173,000] <err> zephyr: Halting system

I've attached here the disassembly for OK XCC 2023 SOF and NOT OK XCC 2024 SOF. zephyr_2023_OK.dst.txt zephyr_2024_NOK.dst.txt

iuliana-prodan avatar Aug 07 '24 08:08 iuliana-prodan

From the crash and after analyzing the disassembly I've realized the error is here:

			acc = AE_SLAI64S(acc, 1); /* Convert to Q17.47 */
92455be4:	297582ac4402b11345f23f 	{ ae_s32.l.i	aed3, a2, -4; l32i	a4, a5, 12; nop; ae_slai64s	aed1, aed1, 1 }

By comparing the two disassembles from RI2023 (which is ok) and RJ2024 (which is NOK): image we can see that for RJ2024, we have a load instruction on slot1, from the four operations in parallel. By looking in the HiFi4 User guide: image this seems to be correct. Also, another difference is the addi and l32i instructions - these are in both cases but in different sets of 4 operations.

At first, I checked that all the variables are aligned based on the instructions restrictions and they seem to be (if we look at the VADDR or A2, A3, etc from crash dump).

I also wrote a simple function that does a store and load (trying to reproduce the order of instructions from where the crash happens), with the exact addresses:

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index d1aa05f55..40edcbee7 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -46,6 +46,22 @@
 
 /* 32 bit data, 32 bit coefficients and 32 bit state variables */
 
+uint32_t first_iteration(void)
+{
+       uint32_t *addr_r = (uint32_t *)0x92c0c220;
+       uint32_t value;
+
+       ae_int32 *addr_w = (ae_int32 *)0x92c0c200;
+       ae_int32x2 tmp = AE_ZERO32();
+
+       AE_S32_L_I (tmp, addr_w, -4);
+
+       value = *addr_r;
+
+       return value;
+}
+int cnt = 0x1;
+
 int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
 {
        ae_int64 acc;
@@ -67,6 +83,11 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
        int j;
        int nseries = iir->biquads_in_series;
 
+       if (cnt == 0x1) {
+               first_iteration();
+       }
+       cnt++;
+
        /* Bypass is set with number of biquads set to zero. */
        if (!iir->biquads)
                return x;

With this, the disassembly looks similar to the crash - meaning is doing a store (ae_s32.l.i), next a load (l32i.n), but it passes:

92446e24 <first_iteration>:
first_iteration():
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:50
{
92446e24:	004136                      	entry	a1, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:55
	ae_int32x2 tmp = AE_ZERO32();
92446e27:	fc48c400672e              	{ l32r	a2, 92429fc4 (92c0c200 <heapmem+0x7200>); ae_movi	aed0, 0 }
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:57
	AE_S32_L_I (tmp, addr_w, -4);
92446e2d:	e002f4                      	ae_s32.l.i	aed0, a2, -4
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:59
	value = *addr_r;
92446e30:	8228                          	l32i.n	a2, a2, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:61
	return value;
92446e32:	f01d                          	retw.n

iuliana-prodan avatar Aug 07 '24 11:08 iuliana-prodan

@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!

iuliana-prodan avatar Aug 07 '24 11:08 iuliana-prodan

@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!

@iuliana-prodan I would raise a ticket with Cadence here as code works with old compiler and not new compiler. This could mean we need a #ifdef or pragma or indeed if our code is wrong here.

lgirdwood avatar Aug 07 '24 14:08 lgirdwood

Yep, good debugging work @iuliana-prodan ! Help from Cadence would be really useful with this.

singalsu avatar Aug 12 '24 09:08 singalsu

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

lgirdwood avatar Aug 14 '24 14:08 lgirdwood

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

I've raised a ticket to cadence "Case# 46820991: Sound Open Firmware code not working with latest XCC toolchain RJ2024.3" but no useful answer.

TBH I have no idea how to less optimize this code.

The problem could be reading from the input buffers here: https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L90-L95. All the intrinsics require aligned addresses.

Same here https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L106-L110.

iuliana-prodan avatar Aug 14 '24 15:08 iuliana-prodan

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;
 
        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''

lgirdwood avatar Aug 16 '24 14:08 lgirdwood

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;
 
        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''

@lgirdwood sorry for the late reply, I was OOO.

I've tried your suggestion and it still doesn't work. I get the same error, at the exact address:

[00:00:02.076,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.076,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.076,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.076,000] <err> os:  **  PS 0x60720
[00:00:02.076,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.076,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.076,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.076,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.076,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.076,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.076,000] <err> os:  ** SAR 0x1d
[00:00:02.076,000] <err> os:  **  THREADPTR 0x2
[00:00:02.076,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.076,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.175,000] <err> zephyr: Halting system

The generated disassembly code is also the same.

iuliana-prodan avatar Aug 28 '24 15:08 iuliana-prodan

The generated disassembly code is also the same.

ok - I would expect this compiler directive to be honored, definitely worth getting input from cadence on their compiler.

lgirdwood avatar Sep 02 '24 09:09 lgirdwood

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] avatar Jun 19 '25 03:06 github-actions[bot]