steam-audio Popping when using default HRTF

System Information

Steam Audio version: 4.5.3
Operating System and version: Windows 10 22H2 19045.4355
(Optional) CPU architecture (e.g. x86-64, armv7): x86_64

Issue Description When using IPLBinauralEffect and IPLDirectEffect in distance attenuation mode, I occasionally get crackle and popping artifacts. When I dump out the resulting in-flight buffers and graph them in Desmos, I found that sometimes the samples are well beyond the [-1,1] range.

Link to desmos graph: https://www.desmos.com/calculator/xlpsejnitp

This is the code I am using:

const auto nchannels = AudioPlayer::GetNChannels();

// render it
IPLfloat32* inputChannels[]{ monoSourceData.data() };
static_assert(std::size(inputChannels) == 1, "Input must be mono!");
IPLAudioBuffer inBuffer{
    .numChannels = 1,
    .numSamples = IPLint32(monoSourceData.GetNumSamples()),
    .data = inputChannels,
};

Debug::Assert(buffer.GetNChannels() == 2, "Non-stereo output is not supported");

IPLfloat32* outputChannels[]{
    buffer[0].data(),
    buffer[1].data()
};
IPLAudioBuffer outputBuffer{
    .numChannels = nchannels,
    .numSamples = IPLint32(buffer.GetNumSamples()),
    .data = outputChannels
};

auto sourcePosInListenerSpace = vector3(invListenerTransform * vector4(sourcePos,1));
auto normalizedPos = glm::normalize(sourcePosInListenerSpace);

IPLBinauralEffectParams params{
    .direction = { normalizedPos.x,normalizedPos.y,normalizedPos.z },
    .interpolation = IPL_HRTFINTERPOLATION_BILINEAR,
    .spatialBlend = 1.0f,
    .hrtf = GetApp()->GetAudioPlayer()->GetSteamAudioHRTF(),
    .peakDelays = nullptr
};

auto result = iplBinauralEffectApply(effects.binauralEffect, &params, &inBuffer, &outputBuffer);

// do distance attenuation in-place
IPLDistanceAttenuationModel distanceAttenuationModel{
    .type = IPL_DISTANCEATTENUATIONTYPE_DEFAULT
};
IPLDirectEffectParams directParams{
    .flags = IPL_DIRECTEFFECTFLAGS_APPLYDISTANCEATTENUATION,
    .distanceAttenuation = iplDistanceAttenuationCalculate(state.context,{sourcePosInListenerSpace.x,sourcePosInListenerSpace.y,sourcePosInListenerSpace.z},{0,0,0},&distanceAttenuationModel)
};
   
result = iplDirectEffectApply(effects.directEffect, &directParams, &outputBuffer, &outputBuffer);

I am using the default SteamAudio HRTF.. The source samples are within the [-1,1] range. Is it expected for SteamAudio to produce samples out of bounds?

May 06 '24 20:05 Ravbug

@Ravbug If the input samples are already close to +/- 1, then it's possible that the HRTF will cause the output samples to be slightly outside the [-1, 1] range, but based on the graph, that's not the issue here. Can you try creating the context with validation enabled (see here) and see if that indicates any invalid/NaN inputs?

May 14 '24 22:05 lakulish

I passed IPL_CONTEXTFLAGS_VALIDATION, and I didn't get any asserts or additional logs. The out-of-range values are in the ballpark of 10-20 so it doesn't seem like UB to me, but I could be wrong.

May 16 '24 22:05 Ravbug

Output sample values in the 10-20 range are definitely not expected. Do you happen to know what the source and listener positions were when these out-of-range sample values were generated? I want to make sure I'm able to reproduce the conditions under which you're encountering the issue. Thanks!

May 16 '24 22:05 lakulish

I reproduced it with these positions, in world space: source pos: (0,0,0) listener pos: (2.50609, 9.20201, 0) listener rotation (quaternion xyzw): (-0.395396, 0.586227, 0.395396, 0.586227)

I noticed that the source audio (before SteamAudio sees it) has a couple of values in the [-20,20] range so this could partially be a case of garbage in -> garbage out, however, the buffers that SteamAudio produces have many more out-of-range samples than the input data. These samples also don't always align with the input out-of-range samples. Is this expected?

Here is another Desmos graph with the source buffer and resulting SteamAudio output buffers for this capture: https://www.desmos.com/calculator/14biausk1g

May 22 '24 00:05 Ravbug

The all seems reasonable. Input seems to be in the range of [-30 to 20] and output from Steam Audio is in similar range. Samples don't align because HRTF is basically and FIR filter which adds delay and does a weights sum of various samples to generate output at a given sample.

Aug 23 '24 21:08 achandak