sof icon indicating copy to clipboard operation
sof copied to clipboard

Audio: MDRC: Restructure Multiband DRC for more effective memory alloc…

Open ShriramShastry opened this issue 1 year ago • 17 comments

This check-in enhances memory management in the Multiband Dynamic Range Control (MDRC) component.

Key Changes:

  1. Streamlined memory allocation and initialization of crossover, emphasis, and de-emphasis coefficients directly in multiband_drc_init_coef.
  2. Updated multiband_drc_init to eliminate the allocation of any obsolete blocks.
  3. Adjusted multiband_drc_free to no longer check and free any now-nonexistent blocks.
  4. Simplified overall memory management by removing unnecessary layers of indirection.

Performance Improvements:

  • Reduced memory allocation overhead.
  • Enhanced data locality, improving cache efficiency.
  • Mitigated heap fragmentation, thereby reducing the chances of memory leaks.

ShriramShastry avatar Jun 05 '24 04:06 ShriramShastry

The allocation of coefficients_block is currently conditional on whether it is NULL. Would you prefer that this allocation be performed unconditionally at the time of cd allocation, ensuring that coefficients_block is never NULL and eliminating all subsequent NULL checks on it?

I would allocate it when needed if it relies on IPC configuration data for allocation size, if its always constant size you could allocate per instance when pipeline is created.

lgirdwood avatar Jun 12 '24 14:06 lgirdwood

The allocation of coefficients_block is currently conditional on whether it is NULL. Would you prefer that this allocation be performed unconditionally at the time of cd allocation, ensuring that coefficients_block is never NULL and eliminating all subsequent NULL checks on it?

I would allocate it when needed if it relies on IPC configuration data for allocation size, if its always constant size you could allocate per instance when pipeline is created.

+1, I think the only thing that would possibly vary is the crossover bands and that is fairly small

cujomalainey avatar Jun 13 '24 02:06 cujomalainey

This PR version has an issue with TGL HiFi3 build. If I set with topology sof-hda-benchmark-drc_multiband32.tplg the control amixer cset name='Analog Playback MULTIBAND_DRC enable' on the playback becomes silent. Sound can be heard with setting amixer cset name='Analog Playback MULTIBAND_DRC enable' off. The current DRC versin does not follow the switch control in runtime, the control needs to be applied when streaming is stopped (or stop the stream to get impact of previously applied control).

I have no idea if MTL HiFi4 has the issue, I don't have a suitable device to check own FW builds.

singalsu avatar Jun 18 '24 09:06 singalsu

This PR version has an issue with TGL HiFi3 build. If I set with topology sof-hda-benchmark-drc_multiband32.tplg the control amixer cset name='Analog Playback MULTIBAND_DRC enable' on the playback becomes silent. Sound can be heard with setting amixer cset name='Analog Playback MULTIBAND_DRC enable' off. The current DRC versin does not follow the switch control in runtime, the control needs to be applied when streaming is stopped (or stop the stream to get impact of previously applied control).

I have no idea if MTL HiFi4 has the issue, I don't have a suitable device to check own FW builds.

@singalsu no that is by design for our code that the on/off switch is applied while the stream is open.

cujomalainey avatar Jun 18 '24 18:06 cujomalainey

@ShriramShastry sorry, I must be missing something. I still don't quite see "Reduced memory allocation overhead" or "Enhanced data locality, improving cache efficiency" or other related changes, announced in the commit message. I see that you changed the allocation order instead of grouping by type to grouping by channel, so potentially allocations of individual channels are now grouped together. Is that the improvement that you mean? I'd make it more explicit in the description instead of your present very generic one. You're also adding more clean-up steps. Are they needed, were they missing before? If they were missing, were there any memory leaks or what problems are those your additions solving? Why aren't those changes mentioned in the commit description? Are they related to initialisation re-ordering or not? If not - maybe split? On the whole - your commit description still doesn't seem to very well describe the commit itself.

lyakh avatar Jul 02 '24 07:07 lyakh

Thank you very much for the reviews. As for specific improvements, I'm still not sure whether this actually improves anything It would be helpful to understand what specific concerns exist; the patch reduces TGL by 54 MCPS, from 188 to 134. ~28% savings.

ShriramShastry avatar Jul 03 '24 08:07 ShriramShastry

Thank you very much for the reviews. As for specific improvements, I'm still not sure whether this actually improves anything It would be helpful to understand what specific concerns exist; the patch reduces TGL by 54 MCPS, from 188 to 134. ~28% savings.

that's the information I was looking for, yes. Maybe it was already provided above, sorry, difficult to find, it has become rather long. So, you're saying that just by rearranging initialisation code to regroup buffers improves run-time (i.e. during copying) performance on TGL by 28%?.. That's the complete .copy() function? Amazing.

lyakh avatar Jul 04 '24 07:07 lyakh

Thank you very much for the reviews. As for specific improvements, I'm still not sure whether this actually improves anything It would be helpful to understand what specific concerns exist; the patch reduces TGL by 54 MCPS, from 188 to 134. ~28% savings.

that's the information I was looking for, yes. Maybe it was already provided above, sorry, difficult to find, it has become rather long. So, you're saying that just by rearranging initialisation code to regroup buffers improves run-time (i.e. during copying) performance on TGL by 28%?.. That's the complete .copy() function? Amazing.

I'm glad to clarify the differences that contribute to the performance improvements:

Memory Allocation Optimization:

Original: Separate loops for allocating crossover, emphasis, and de-emphasis coefficients per channel. Modified: A single loop handles allocation and initialization for all coefficients per channel, enhancing data locality and cache efficiency. This is achieved by grouping these allocations by channel (crossover, emphasis, deemphasis), which is more cache-friendly during runtime processing.

Initialization Enhancement:

Original: Filter coefficients were initialized in separate routines. Modified: Consolidated initialization routines (multiband_drc_init_coef) streamline the setup process, reducing computational overhead and simplifying the code structure.

Improved Clean-up Procedures:

Original: Clean-up procedures were less detailed. Modified: Detailed clean-up steps (multiband_drc_free) ensure thorough resource deallocation, preventing memory leaks and enhancing long-term performance stability.

Error Handling:

Original: Error handling was less comprehensive during initialization. Modified: Rigorous error handling during initialization to ensure stability and prevent resource leaks in case of failures.

These changes collectively lead to a 28% reduction in MCPS during the .copy() function execution on TGL platforms by improving cache performance and reducing initialization overhead.

ShriramShastry avatar Jul 04 '24 12:07 ShriramShastry

@johnylin76 I think we can use this PR as a starting point but can also take it much further with regards to embedding state.

cujomalainey avatar Jul 23 '24 20:07 cujomalainey

This PR version has an issue with TGL HiFi3 build. If I set with topology sof-hda-benchmark-drc_multiband32.tplg the control amixer cset name='Analog Playback MULTIBAND_DRC enable' on the playback becomes silent. Sound can be heard with setting amixer cset name='Analog Playback MULTIBAND_DRC enable' off. The current DRC versin does not follow the switch control in runtime, the control needs to be applied when streaming is stopped (or stop the stream to get impact of previously applied control). I have no idea if MTL HiFi4 has the issue, I don't have a suitable device to check own FW builds.

@singalsu no that is by design for our code that the on/off switch is applied while the stream is open.

Hi @singalsu that should be fixed by https://github.com/thesofproject/sof/pull/8922 (oh I realized that commit is not on main stream, I should land it there.)

johnylin76 avatar Jul 24 '24 06:07 johnylin76

Hi @singalsu that should be fixed by #8922 (oh I realized that commit is not on main stream, I should land it there.)

@johnylin76 Oh, good finding, please do a PR for main as well. I've been wondering this, if intentional or not. Good that the switch will impact processing immediately.

singalsu avatar Jul 24 '24 08:07 singalsu

Thank you very much for the reviews. As for specific improvements, I'm still not sure whether this actually improves anything It would be helpful to understand what specific concerns exist; the patch reduces TGL by 54 MCPS, from 188 to 134. ~28% savings.

that's the information I was looking for, yes. Maybe it was already provided above, sorry, difficult to find, it has become rather long. So, you're saying that just by rearranging initialisation code to regroup buffers improves run-time (i.e. during copying) performance on TGL by 28%?.. That's the complete .copy() function? Amazing.

I'm glad to clarify the differences that contribute to the performance improvements:

Memory Allocation Optimization:

Original: Separate loops for allocating crossover, emphasis, and de-emphasis coefficients per channel. Modified: A single loop handles allocation and initialization for all coefficients per channel, enhancing data locality and cache efficiency. This is achieved by grouping these allocations by channel (crossover, emphasis, deemphasis), which is more cache-friendly during runtime processing.

Initialization Enhancement:

Original: Filter coefficients were initialized in separate routines. Modified: Consolidated initialization routines (multiband_drc_init_coef) streamline the setup process, reducing computational overhead and simplifying the code structure.

Improved Clean-up Procedures:

Original: Clean-up procedures were less detailed. Modified: Detailed clean-up steps (multiband_drc_free) ensure thorough resource deallocation, preventing memory leaks and enhancing long-term performance stability.

Error Handling:

Original: Error handling was less comprehensive during initialization. Modified: Rigorous error handling during initialization to ensure stability and prevent resource leaks in case of failures.

These changes collectively lead to a 28% reduction in MCPS during the .copy() function execution on TGL platforms by improving cache performance and reducing initialization overhead.

Sorry, I don't follow which of these improvements could bring you a 28% run-time performance improvement. The fact that buffers are now grouped by channel? That might improve runtime a bit, but very unlikely 28%.

Re: improved error handling - I only see a check for cd which I'm not sure is needed, don't think that function would be called if cd allocation failed. If anything, this should be split into multiple commits - one commit per improvement, then you can check which specific commit / change makes the claimed performance difference, and would be good to tell us how exactly you measure that difference - between which points.

lyakh avatar Aug 12 '24 09:08 lyakh

You can also see a valgrind reported error with scripts/host-testbench.sh run. Please try that yourself.

singalsu avatar Aug 27 '24 15:08 singalsu

Release reminder - one week to v2.11-rc1.

kv2019i avatar Sep 06 '24 11:09 kv2019i

FYI @ShriramShastry , moving to v2.12

kv2019i avatar Sep 13 '24 07:09 kv2019i

Feature cutoff for v2.12, moving this to v2.13.

kv2019i avatar Dec 13 '24 12:12 kv2019i

No update for two releases, moving to TBD milestone.

kv2019i avatar Apr 23 '25 13:04 kv2019i