Player icon indicating copy to clipboard operation
Player copied to clipboard

Clipping when multiple sounds are playing

Open Lt-knb opened this issue 5 months ago • 17 comments

In EasyRPG, there's clipping when total_volume is greater than 1.0. In RPG_RT, sound is limited to 0 dB.

.Flow has a lot of clipping, but you can notice it in Yume Nikki too, and Test Game 2000 (in Music Playing). I only tested those, but in theory, it should affect all games. Reproducing it is very easy.

This is caused by Decode in audio_generic.cpp: It applies compression with a soft knee at -2 dB while converting to 16 bit, which causes distortion as if it's clipping. RPG_RT limits to 0 dB by scaling samples uniformly.

Basically, replace this:

		if (total_volume > 1.0) {
			float threshold = 0.8f;
			for (unsigned i = 0; i < (unsigned)(samples_per_frame * 2); i++) {
				float sample = mixer_buffer[i];
				float sign = (sample < 0) ? -1.0 : 1.0;
				sample /= sign;
				//dynamic range compression
				if (sample > threshold) {
					sample_buffer[i] = sign * 32768.0 * (threshold + (1.0 - threshold) * (sample - threshold) / (total_volume - threshold));
				} else {
					sample_buffer[i] = sign * sample * 32768.0;
				}
			}
		} else {
			//No dynamic range compression necessary
			for (unsigned i = 0; i < (unsigned)(samples_per_frame * 2); i++) {
				sample_buffer[i] = mixer_buffer[i] * 32768.0;
			}
		}

With this:

	float max_sample = 0.0f;
	unsigned num_samples = samples_per_frame * 2;
	for (unsigned i = 0; i < num_samples; i++) {
		float abs_sample = std::abs(mixer_buffer[i]);
		if (abs_sample > max_sample) max_sample = abs_sample;
	}
	float scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
	for (unsigned i = 0; i < num_samples; i++) {
		float sample = mixer_buffer[i] * scale;
		sample_buffer[i] = static_cast<int16_t>(sample * 32768.0f);
	}

Which also makes total_volume unnecessary

Lt-knb avatar Jul 07 '25 22:07 Lt-knb

Relevant notes regarding volume clipping caused by simultaneous playback (if possible) of the same file at the same sample position that could allow testing this in a more precise scenario. RPG_RT.exe has channel limits not implemented in player yet (see #1356), so some tests may behave different even after fixing clipping:

https://github.com/EasyRPG/Player/issues/1120#issuecomment-294617773 (4 channels for different sounds, same sounds share the same channel, up to 16 times, older get overwritten when the channel count is full).

https://github.com/EasyRPG/Player/issues/1356#issuecomment-485201061 includes test cases for channel limit tests.

fdelapena avatar Jul 08 '25 23:07 fdelapena

thanks for the patch, will try it next week

Ghabry avatar Jul 09 '25 10:07 Ghabry

also worth checking: #3045 (could be the same clipping problem) and #3267 (not sure yet if this issue or WildMidi bug)

Ghabry avatar Jul 09 '25 15:07 Ghabry

I did some more tests in RPG_RT. The limiter has a linear release with a fixed per-sample increment.

Here's a Python test that matches the limiter in RPG_RT:

import numpy as np
import wave

sample_rate = 44100
duration = 3.0
freq1 = 1000
freq2 = 500
freq3 = 250
gain1 = 1.0
gain2 = 1.0
gain3 = 1.0
start_2 = 1.0
duration_2 = 1.0
start_3 = 1.3
duration_3 = 0.4

# Time array
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)

# Generate mono sines
sine1 = gain1 * np.sin(2 * np.pi * freq1 * t)

sine2 = np.zeros_like(t)
idx_start_2 = int(start_2 * sample_rate)
idx_end_2 = idx_start_2 + int(duration_2 * sample_rate)
sine2[idx_start_2:idx_end_2] = gain2 * np.sin(2 * np.pi * freq2 * t[:idx_end_2 - idx_start_2])

sine3 = np.zeros_like(t)
idx_start_3 = int(start_3 * sample_rate)
idx_end_3 = idx_start_3 + int(duration_3 * sample_rate)
sine3[idx_start_3:idx_end_3] = gain3 * np.sin(2 * np.pi * freq3 * t[:idx_end_3 - idx_start_3])

# Convert sines to stereo
sine1_stereo = np.column_stack((sine1, sine1))
sine2_stereo = np.column_stack((sine2, sine2))
sine3_stereo = np.column_stack((sine3, sine3))

# Mix stereo sines
mixed_stereo = sine1_stereo + sine2_stereo + sine3_stereo

# Limiter
lookahead_ms = 5.0
attack_ms = 2.0
release_per_sample = 0.0000175
lookahead_samples = int(sample_rate * lookahead_ms / 1000.0)
attack_samples = int(sample_rate * attack_ms / 1000.0)
    
padded_audio = np.vstack((mixed_stereo, np.zeros((lookahead_samples, 2))))
output_audio = np.zeros_like(mixed_stereo)
    
current_scale = 1.0
attack = np.ones(attack_samples)
attack_index = attack_samples
target_scale = 1.0

for i in range(len(mixed_stereo)):
    lookahead_window = padded_audio[i:i + lookahead_samples]
    max_sample = np.max(np.abs(lookahead_window))
    required_scale = 1.0 / max_sample if max_sample > 1.0 else 1.0
    safe_peak = max(max_sample, 0.0001)
    release_rate = release_per_sample / safe_peak if max_sample < 1.0 else release_per_sample
    
    # Attack
    if required_scale < target_scale:
        target_scale = required_scale
        attack_start = current_scale
        attack_diff = target_scale - attack_start
        attack = attack_start + np.linspace(0, attack_diff, attack_samples)
        attack_index = 0
    if attack_index < attack_samples:
        current_scale = attack[attack_index]
        attack_index += 1
    
    # Release    
    elif current_scale < 1.0:
        current_scale += release_rate
        if current_scale > 1.0:
            current_scale = 1.0                
        target_scale = current_scale

    output_audio[i] = mixed_stereo[i] * current_scale

output_audio = np.clip(output_audio, -1.0, 1.0)
pcm_int16 = (output_audio * 32767).astype(np.int16)

# Save WAV
wav_filename = "Final Limiter.wav"
with wave.open(wav_filename, 'wb') as wav_file:
    wav_file.setnchannels(2)
    wav_file.setsampwidth(2)
    wav_file.setframerate(sample_rate)
    wav_file.writeframes(pcm_int16.tobytes())

@Fdelapena I tested my Python limiter. If I play the same sine more than four times, the difference with RPG_RT is that release starts more quiet and takes longer (expected) because there's no channel limit.

I found a very bad bug in RPG_RT though: If I copy and rename a file to use different channels, release breaks if the max sample is louder than 9.0 (+19 dB). It keeps making them louder forever (until you need to see a doctor...). My Python test doesn't do that of course, I'm not crazy lol

Lt-knb avatar Jul 20 '25 05:07 Lt-knb

also worth checking: #3045 (could be the same clipping problem)

I tested Muma Rope (32.2 and 33.3). It's this issue, but it was "fixed" with a workaround at some point: In 33.3, there's a check to detect EasyRPG. If it's detected, Black Waters plays a pre-rendered BGM version instead of the SE. There's a similar workaround to fix Generate Dungeon too

Lt-knb avatar Jul 24 '25 00:07 Lt-knb

What kind of check do they use to detect us?

And unfortunately I had no time in general to do any EasyRPG work recently. So no progress with this issue here

Ghabry avatar Jul 24 '25 00:07 Ghabry

What kind of check do they use to detect us?

The game sends the player to Map0216, which is both in the lmu and emu format in the files of the game, and depending on which one gets visited when the player is sent there, the engine is detected differently.

Mimigris avatar Jul 24 '25 07:07 Mimigris

I think my Python limiter does this:

const int lookahead_samples = frequency * 5 / 1000;	// 5 ms
const int attack_samples = frequency * 2 / 1000;	// 2 ms
constexpr float release_per_sample = 0.0000175f;

std::vector<float> attack_curve(attack_samples, 1.0f);
int attack_index = attack_samples;
float current_scale = 1.0f;
float target_scale = 1.0f;

std::vector<float> padded_buffer = mixer_buffer;
padded_buffer.resize(mixer_buffer.size() + lookahead_samples * 2, 0.0f);

for (int i = 0; i < samples_per_frame; ++i) {
	float max_sample = 0.0f;
	for (int j = 0; j < lookahead_samples; ++j) {
		int idx = (i + j) * 2;
		float abs_l = std::fabs(padded_buffer[idx]);
		float abs_r = std::fabs(padded_buffer[idx + 1]);
		max_sample = std::max({max_sample, abs_l, abs_r});
	}
	float required_scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
	float safe_peak = std::max(max_sample, 0.0001f);
	float release_rate = (max_sample < 1.0f) ? (release_per_sample / safe_peak) : release_per_sample;

	// Attack
	if (required_scale < target_scale) {
		target_scale = required_scale;
		float att_start = current_scale;
		float att_diff = target_scale - att_start;
		for (int k = 0; k < attack_samples; ++k) {
			attack_curve[k] = att_start + (att_diff * k) / attack_samples;
		}
		attack_index = 0;
	}
	if (attack_index < attack_samples) {
		current_scale = attack_curve[attack_index++];
	}
	// Release
	else if (current_scale < 1.0f) {
		current_scale += release_rate;
		if (current_scale > 1.0f) current_scale = 1.0f;
		target_scale = current_scale;
	}
	float sample_l = std::clamp(mixer_buffer[i * 2] * current_scale, -1.0f, 1.0f);
	float sample_r = std::clamp(mixer_buffer[i * 2 + 1] * current_scale, -1.0f, 1.0f);
	sample_buffer[i * 2] = static_cast<int16_t>(sample_l * 32767.0f);
	sample_buffer[i * 2 + 1] = static_cast<int16_t>(sample_r * 32767.0f);
}

Lt-knb avatar Jul 25 '25 22:07 Lt-knb

@Lt-knb I will finally take a quick look at this. Sorry for the delay. Is the code you provided in the last comment what I'm supposed to add to the Player?

Or this one here:

cpp

	float max_sample = 0.0f;
	unsigned num_samples = samples_per_frame * 2;
	for (unsigned i = 0; i < num_samples; i++) {
		float abs_sample = std::abs(mixer_buffer[i]);
		if (abs_sample > max_sample) max_sample = abs_sample;
	}
	float scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
	for (unsigned i = 0; i < num_samples; i++) {
		float sample = mixer_buffer[i] * scale;
		sample_buffer[i] = static_cast<int16_t>(sample * 32768.0f);
	}

Ghabry avatar Jul 31 '25 10:07 Ghabry

Is the code you provided in the last comment what I'm supposed to add to the Player?

Yes, last comment. It's my attempt at a "translation" of my Python limiter

Lt-knb avatar Jul 31 '25 13:07 Lt-knb

There is some glitching when using your code.

Likely a problem with the Python -> C++ transformation... will try to figure this out.

https://github.com/user-attachments/assets/5be89c6c-5ba8-4bc7-8850-4fb84b1152c7

Ghabry avatar Jul 31 '25 14:07 Ghabry

Correction. The code must be optimized. It does not glitch in release mode.

The slow code is:

for (int j = 0; j < lookahead_samples; ++j) {
     int idx = (i + j) * 2;
     float abs_l = std::fabs(padded_buffer[idx]);
     float abs_r = std::fabs(padded_buffer[idx + 1]);
     max_sample = std::max({max_sample, abs_l, abs_r});
}

when I comment this out the glitching is gone.

Ghabry avatar Jul 31 '25 15:07 Ghabry

I tested the PR. I don't have that crackling you get 🤔 My CPU usage is 0.6%. I have a Ryzen 7 7700 though (and I tested Player 64 bit).

The limiter is forgetting all variables when the buffer ends right now (2048 samples), so it resets every 46 ms, causing release to be truncated, and attack to be triggered again. We need to make the variables persistent.

Edit: Tested 32 bit, still not getting your crackling

Lt-knb avatar Jul 31 '25 18:07 Lt-knb

Let's ignore any performance discussion for now. Correctness comes first.

Can you tell me which variables (names of them) must survive across buffer fills? I just have no idea of audio so I don't really know what is going on in the code. 😅

Ghabry avatar Aug 01 '25 10:08 Ghabry

These: attack_index current_scale target_scale

Lt-knb avatar Aug 01 '25 14:08 Lt-knb

Tested. Now release is no longer truncated, and attack doesn't trigger when it shouldn't.

There's an issue with the lookahead, but I believe this is more important right now: Sound timing is wrong in EasyRPG, so my sine test (which uses Wait) has asymmetry, the samples don't align. That makes waveform comparisons hard when I test the limiter.

In RPG_RT, Play SE plays after exactly 1 second when Wait is 1 s, so my sine test has perfect symmetry there. I also tried this: I opened Equipment (in Test Game 2000) and held the Down key. The distance between the Cursor sounds was alternating between 60 and 70 ms. So it seems to imply a buffer is just 10 ms (441 samples). This is #2861.

To match RPG_RT, samples_per_frame would have to be 441 instead of 2048. But honestly, there's no way the buffer length in DirectSound is set to such a low number without consequences in RPG_RT, it's not feasible, not even today. So I don't think sound timing is tied to the buffer size, it should be its own thing

Lt-knb avatar Aug 06 '25 15:08 Lt-knb

Wait a second... Remember the +19 dB issue in RPG_RT, that causes absolute eardrum carnage? When that happened, Audacity was aware of the samples' real loudness, and it let me normalize to 0 dB, so it was 32 bit float. I could see and play the full waveform, unaffected by the -1.0, 1.0 ceiling. So not only the limiter broke in RPG_RT, Windows' limiter (CAudioLimiter) failed too 👀 That means... There's actually no limiter in RPG_RT (nor DirectSound), it's just relying on Windows'. So I've been trying to replicate Windows' own limiter without realizing >.>

I played my sines in Foobar and Audacity simultaneously to make sure, and that limiter kicked in.

As long as we output as 16 bit, it can't do its job, because we're doing the mixing ourselves, so it's too late. But if we output as Float 32 instead (and remove my limiter... Ugh), then that limiter will work

Lt-knb avatar Aug 16 '25 03:08 Lt-knb