LLEXT: fix CI failures and make DRC an LLEXT module by default on MTL
Build all the code, supporting LLEXT, modular UPDATE: changed to only build DRC as LLEXT on MTL
It has been suggested that the CI might just work if we switch any module to llext and build a deployable layout.
@marc-hb ok, no, a very clear indication that it unfortunately doesn't "just work:" https://sof-ci.01.org/sofpr/PR9116/build4627/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=verify-kernel-boot-log so supposedly some work needs to be done...
SOFCI TEST
https://github.com/thesofproject/sof/actions/runs/9387214836/job/25849589149?pr=9116
FAILED: zephyr/smart_amp_test_llext/smart_amp_test.llext D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext
C:\Windows\system32\cmd.exe /C "cd /D D:\a\sof\sof\workspace\build-mtl\zephyr\smart_amp_test_llext && D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe -R.xt.* D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input -oD:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext && "C:\Program Files\CMake\bin\cmake.exe" -E true"
D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe: 'D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input': No such file
https://sof-ci.01.org/sofpr/PR9116/build5189/build also failed.
https://github.com/thesofproject/sof/actions/runs/9387214836/job/25849589149?pr=9116
FAILED: zephyr/smart_amp_test_llext/smart_amp_test.llext D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext C:\Windows\system32\cmd.exe /C "cd /D D:\a\sof\sof\workspace\build-mtl\zephyr\smart_amp_test_llext && D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe -R.xt.* D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input -oD:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext && "C:\Program Files\CMake\bin\cmake.exe" -E true" D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe: 'D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input': No such filehttps://sof-ci.01.org/sofpr/PR9116/build5189/build also failed.
@marc-hb fixed that. Could you or @fredoh9 help check why that's the case? We'd need to compare intermediate results EDIT: sorry, I meant to ask: why the windows / linux build comparison is failing now
https://github.com/thesofproject/sof/actions/runs/9418758155/job/25947397562?pr=9116
Files linux-build mtl/build-sof-staging/sof/intel/sof-ipc4-lib/mtl/community/smart_amp_test.llext and windows-build mtl/build-sof-staging/sof/intel/sof-ipc4-lib/mtl/community/smart_amp_test.llext differ
Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)
Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)
Actually, there's a better temporary solution: turn off modules when testing reproducible builds. Otherwise code in modules is not tested.
Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)
@marc-hb makes sense, thanks! I'll try adding stripping .comment to Zephyr LLEXT cmake code
Actually, there's a better temporary solution: turn off modules when testing reproducible builds. Otherwise code in modules is not tested.
@marc-hb not sure I understand - doesn't the failing test mean, that modules do get tested?
@wszypelt QB stuck?
@lyakh Unfortunately, I'm trying to solve this issue because more PRs are stuck. As long as I manually added it to the queue, the results should be available within an hour
@lyakh can you rebase and re-push. Thanks !
@lyakh can you rebase and re-push. Thanks !
@lgirdwood It isn't just about rebasing: we're waiting for 2 things to happen: (1) QB support for LLEXT modules @wszypelt , and (2) a solution on how to resolve the failing Linux-Windows comparison @marc-hb https://github.com/thesofproject/sof/pull/9116#issuecomment-2162490916
@lyakh can you rebase and re-push. Thanks !
@lgirdwood It isn't just about rebasing: we're waiting for 2 things to happen: (1) QB support for LLEXT modules @wszypelt , and (2) a solution on how to resolve the failing Linux-Windows comparison @marc-hb #9116 (comment)
Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.
@wszypelt is there an ETA for when internal CI could support this build target ? Fwiw, @mwasko and I were discussing today. My preference would be to have this build option testable by all CIs for best coverage.
Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.
@lgirdwood @marc-hb we could diff --exclude=*.llext "for now"
Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.
@lgirdwood @marc-hb we could
diff --exclude=*.llext"for now"
Yep, whatever is least effort.
@lyakh please try to add CONFIG_LIBRARY_DEFAULT_MODULAR=n to repro-build.conf after thesofproject/sof#9264 is merged.
@lyakh can you rebase and re-push. Thanks !
@lgirdwood It isn't just about rebasing: we're waiting for 2 things to happen: (1) QB support for LLEXT modules @wszypelt , and (2) a solution on how to resolve the failing Linux-Windows comparison @marc-hb #9116 (comment)
Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.
@wszypelt is there an ETA for when internal CI could support this build target ? Fwiw, @mwasko and I were discussing today. My preference would be to have this build option testable by all CIs for best coverage.
@lgirdwood I talked to the developer, there is already a solution, but we still have some problems with it, I honestly believe that everything will work by Monday
linux-windows comparison is fixed now. next waiting for the MTL regression to be fixed and for a QB integration
The
LIBRARY_DEFAULT_MODULARopt-in program looks like a complex Kconfig hack. It adds multiple levels of defaults, it is pretty verbose (need to edit the Kconfig of each component) and the only thing it seems to achieve is to avoid an .conf overlay a list of modules. Why not just do such an overlay? Keep it simple.
@marc-hb we already have such overlays, but I thought that overlays in default build configurations were frowned upon?
sof-ipc4-lib/ is empty in https://github.com/thesofproject/sof/actions/runs/9776233023/job/26988310949?pr=9116, is that expected?
sof-ipc4-lib/is empty in https://github.com/thesofproject/sof/actions/runs/9776233023/job/26988310949?pr=9116, is that expected?
not sure why that one is empty, but I see the DRC module being loaded on MTL HDA https://sof-ci.01.org/sofpr/PR9116/build6159/devicetest/index.html?model=MTLP_RVP_HDA&testcase=verify-sof-firmware-load:
[ 5.010647] kernel: snd_sof:sof_ipc4_fw_parse_ext_man: sof-audio-pci-intel-mtl 0000:00:1f.3: module DRC: UUID B36EE4DA-006F-47F9-A06D-FECBE2D8B6CE cfg_count: 1, bss_size: 0x1000
[ 5.010672] kernel: snd_sof_intel_hda_common:hda_dsp_stream_hw_params: sof-audio-pci-intel-mtl 0000:00:1f.3: FW Poll Status: reg[0x1c0]=0x40000 successful
[ 5.010704] kernel: snd_sof_intel_hda_common:hda_dsp_stream_hw_params: sof-audio-pci-intel-mtl 0000:00:1f.3: FW Poll Status: reg[0x1c0]=0x40000 successful
[ 5.010711] kernel: snd_sof_intel_hda_common:hda_dsp_stream_setup_bdl: sof-audio-pci-intel-mtl 0000:00:1f.3: period_bytes:0x0
[ 5.010713] kernel: snd_sof_intel_hda_common:hda_dsp_stream_setup_bdl: sof-audio-pci-intel-mtl 0000:00:1f.3: periods:1
[ 5.010725] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx : 0x19000000|0x0: GLB_LOAD_LIBRARY_PREPARE
[ 5.011798] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x39000000|0x0: GLB_LOAD_LIBRARY_PREPARE
[ 5.011822] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx done : 0x19000000|0x0: GLB_LOAD_LIBRARY_PREPARE
[ 5.011830] kernel: snd_sof_intel_hda_common:hda_dsp_ipc4_load_library: sof-audio-pci-intel-mtl 0000:00:1f.3: FW Poll Status: reg[0x1d0]=0x409800 successful
[ 5.011838] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx : 0x18010000|0x0: GLB_LOAD_LIBRARY
[ 5.031498] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x38000000|0x0: GLB_LOAD_LIBRARY
[ 5.031510] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx done : 0x18010000|0x0: GLB_LOAD_LIBRARY
but yes, strange that it wasn't built in that test
The
rimage.pathhack needs at least a minimum of explanation somewhere. Best is probably a new bug. Right now the hack has zero comment and is not even mentioned in any commit message.
@marc-hb here you go: https://github.com/thesofproject/sof/actions/runs/9791262388/job/27034644220?pr=9281
In dir: D:\a\sof\sof\workspace; running command:
''"'"'D:\a\sof\sof\workspace\build-rimage\rimage.EXE'"'"'' -o 'D:\a\sof\sof\workspace\build-mtl\zephyr\eq_iir_llext\eq_iir.llext.ri' -e -c 'D:\a\sof\sof\workspace\build-mtl\zephyr\eq_iir_llext\rimage_config.toml' -k 'D:\a\sof\sof\workspace\sof\keys\otc_private_key_3k.pem' -l -r 'D:\a\sof\sof\workspace\build-mtl\zephyr\eq_iir_llext\eq_iir.llext'
but I'd rather just fix it than create a bug for reference for it.
sof-ipc4-lib/is empty in https://github.com/thesofproject/sof/actions/runs/9776233023/job/26988310949?pr=9116, is that expected?
@marc-hb I know why - all those builds use --overlay=sof/app/overlays/repro-build.conf and that one disables CONFIG_MODULES
CI:
- coding style: false positives for missing Kconfig "help" (it's present) and requiring parentheses in a UUID macro definition (@andyross) which would break it
- QB: need to clarify @wszypelt
- main-ace jenkins: this is the important one. And I think it's good now. The failures: 3.1. https://sof-ci.01.org/sofpr/PR9116/build6351/devicetest/index.html?model=MTLP_RVP_HDA&testcase=multiple-pause-resume-50 seems to be https://github.com/thesofproject/linux/issues/5048 although on MTL 3.2. sof-logger failed on all 3 platforms (HDA, SDW, nocodec), e.g. https://sof-ci.01.org/sofpr/PR9116/build6351/devicetest/index.html?model=MTLP_RVP_HDA&testcase=check-sof-logger is thesofproject/sof-test#1216 - also failed on LNL and TGL
- multiple LNL SDW failures https://sof-ci.01.org/sofpr/PR9116/build6350/devicetest/index.html must be unrelated, also seen e.g. in thesofproject/sof#9287 https://sof-ci.01.org/sofpr/PR9287/build6345/devicetest/index.html
+1, all comments addressed. Based on DRC changes it looks nice to use, just make sure CI works
@abonislawski thanks, yes, we're looking into QB failures ATM
@lyakh @abonislawski QB Internal CI now works correctly, DRC on MTL is checked, our all tests in Internal Intel CI in green
@lyakh @abonislawski QB Internal CI now works correctly, DRC on MTL is checked, our all tests in Internal Intel CI in green
great! Thanks a lot @wszypelt !
Hmm, @lyakh can you check this https://sof-ci.01.org/sofpr/PR9116/build6415/devicetest/index.html?model=MTLP_RVP_HDA&testcase=multiple-pause-resume-50
[ 1032.371199] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x13010004|0x0
[ 1032.371224] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 1032.371236] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x93010004|0x0|0x0, target: 0x33000000|0x0|0x0, ctl: 0x3
Hmm, @lyakh can you check this https://sof-ci.01.org/sofpr/PR9116/build6415/devicetest/index.html?model=MTLP_RVP_HDA&testcase=multiple-pause-resume-50
[ 1032.371199] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x13010004|0x0 [ 1032.371224] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------ [ 1032.371236] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x93010004|0x0|0x0, target: 0x33000000|0x0|0x0, ctl: 0x3
@kv2019i I thought it was the same as https://github.com/thesofproject/linux/issues/5048 but (1) this one is on MTL, the other one is on LNL, and (2) this one seems to happen consistently with this PR while the LNL bug is rather rare? Is my understanding correct?
@lyakh wrote:
@kv2019i I thought it was the same as thesofproject/linux#5048 but (1) this one is on MTL, the other one is on LNL, and (2) this one seems to happen consistently with this PR while the LNL bug is rather rare? Is my understanding correct?
It's the same test but at least the most recent failure case for this PR seems to have a IPC timeout. The known LNL fail looks like this https://sof-ci.01.org/sofpr/PR9235/build5580/devicetest/index.html?model=LNLM_RVP_HDA&testcase=multiple-pause-resume-50 -- user-space getting error status but no errors really in kernel/fw logs.