sof icon indicating copy to clipboard operation
sof copied to clipboard

Audio: Mixer: Add hifi version processing functions for mixer

Open andrula-song opened this issue 2 years ago • 9 comments

Add hifi3 & hifi4 version implementation of mixer processing functions. The hifi version functions can save at least 47% cycles than C version.

Signed-off-by: Andrula Song [email protected]

andrula-song avatar Aug 12 '22 05:08 andrula-song

Since hifi3 and hifi4 will use the same instructions, so named the hifi version of mixer as mixer_hifi.c. compared with the original C version, the functions can save at least 47% cycles, here is the result: mix_n_s16 can save about 67% cycles than C version; mix_n_s24 can save about 51% cycles than C version; mix_n_s32 can save about 47% cycles than C version; mixer-new

andrula-song avatar Aug 12 '22 05:08 andrula-song

SOFCI_TEST

XiaoyunWu6666 avatar Aug 12 '22 07:08 XiaoyunWu6666

SOFCI TEST

XiaoyunWu6666 avatar Aug 12 '22 10:08 XiaoyunWu6666

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

lgirdwood avatar Aug 15 '22 11:08 lgirdwood

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

It appears to aid optimization by providing additional information to the compiler.

[ From HiFi User Guide documentation ] 3.4 Standard C/C++ Auto-Vectorization Auto-vectorization of scalar C code can produce effective results on simple loop nests, but has its limits. It can be improved through the use of compiler pragmas and options, and effective data marshalling to make data accesses (loads and stores) regular and aligned.

Pragma is widely used in Nature DSP Library functions.

ShriramShastry avatar Aug 15 '22 12:08 ShriramShastry

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

It appears to aid optimization by providing additional information to the compiler.

That's correct, but in this case @andrula-song is hand writing the intrinsics and the loops are complex. The autovectorizer works best on simple small loops, and the pragma suggestions above are not applicable here (and would probably make performance worse).

lgirdwood avatar Aug 15 '22 14:08 lgirdwood

approval condition to addressing comments from @singalsu of course

lyakh avatar Aug 18 '22 09:08 lyakh

hi @wszypelt , can you help to check internel CI? Thanks.

andrula-song avatar Aug 19 '22 05:08 andrula-song

@lrudyX can you check CI, its showing a blank page. Thanks !

lgirdwood avatar Aug 19 '22 08:08 lgirdwood

SOFCI_TEST

andrula-song avatar Aug 22 '22 01:08 andrula-song