Clipping when multiple sounds are playing
In EasyRPG, there's clipping when total_volume is greater than 1.0. In RPG_RT, sound is limited to 0 dB.
.Flow has a lot of clipping, but you can notice it in Yume Nikki too, and Test Game 2000 (in Music Playing). I only tested those, but in theory, it should affect all games. Reproducing it is very easy.
This is caused by Decode in audio_generic.cpp: It applies compression with a soft knee at -2 dB while converting to 16 bit, which causes distortion as if it's clipping. RPG_RT limits to 0 dB by scaling samples uniformly.
Basically, replace this:
if (total_volume > 1.0) {
float threshold = 0.8f;
for (unsigned i = 0; i < (unsigned)(samples_per_frame * 2); i++) {
float sample = mixer_buffer[i];
float sign = (sample < 0) ? -1.0 : 1.0;
sample /= sign;
//dynamic range compression
if (sample > threshold) {
sample_buffer[i] = sign * 32768.0 * (threshold + (1.0 - threshold) * (sample - threshold) / (total_volume - threshold));
} else {
sample_buffer[i] = sign * sample * 32768.0;
}
}
} else {
//No dynamic range compression necessary
for (unsigned i = 0; i < (unsigned)(samples_per_frame * 2); i++) {
sample_buffer[i] = mixer_buffer[i] * 32768.0;
}
}
With this:
float max_sample = 0.0f;
unsigned num_samples = samples_per_frame * 2;
for (unsigned i = 0; i < num_samples; i++) {
float abs_sample = std::abs(mixer_buffer[i]);
if (abs_sample > max_sample) max_sample = abs_sample;
}
float scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
for (unsigned i = 0; i < num_samples; i++) {
float sample = mixer_buffer[i] * scale;
sample_buffer[i] = static_cast<int16_t>(sample * 32768.0f);
}
Which also makes total_volume unnecessary
Relevant notes regarding volume clipping caused by simultaneous playback (if possible) of the same file at the same sample position that could allow testing this in a more precise scenario. RPG_RT.exe has channel limits not implemented in player yet (see #1356), so some tests may behave different even after fixing clipping:
https://github.com/EasyRPG/Player/issues/1120#issuecomment-294617773 (4 channels for different sounds, same sounds share the same channel, up to 16 times, older get overwritten when the channel count is full).
https://github.com/EasyRPG/Player/issues/1356#issuecomment-485201061 includes test cases for channel limit tests.
thanks for the patch, will try it next week
also worth checking: #3045 (could be the same clipping problem) and #3267 (not sure yet if this issue or WildMidi bug)
I did some more tests in RPG_RT. The limiter has a linear release with a fixed per-sample increment.
Here's a Python test that matches the limiter in RPG_RT:
import numpy as np
import wave
sample_rate = 44100
duration = 3.0
freq1 = 1000
freq2 = 500
freq3 = 250
gain1 = 1.0
gain2 = 1.0
gain3 = 1.0
start_2 = 1.0
duration_2 = 1.0
start_3 = 1.3
duration_3 = 0.4
# Time array
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
# Generate mono sines
sine1 = gain1 * np.sin(2 * np.pi * freq1 * t)
sine2 = np.zeros_like(t)
idx_start_2 = int(start_2 * sample_rate)
idx_end_2 = idx_start_2 + int(duration_2 * sample_rate)
sine2[idx_start_2:idx_end_2] = gain2 * np.sin(2 * np.pi * freq2 * t[:idx_end_2 - idx_start_2])
sine3 = np.zeros_like(t)
idx_start_3 = int(start_3 * sample_rate)
idx_end_3 = idx_start_3 + int(duration_3 * sample_rate)
sine3[idx_start_3:idx_end_3] = gain3 * np.sin(2 * np.pi * freq3 * t[:idx_end_3 - idx_start_3])
# Convert sines to stereo
sine1_stereo = np.column_stack((sine1, sine1))
sine2_stereo = np.column_stack((sine2, sine2))
sine3_stereo = np.column_stack((sine3, sine3))
# Mix stereo sines
mixed_stereo = sine1_stereo + sine2_stereo + sine3_stereo
# Limiter
lookahead_ms = 5.0
attack_ms = 2.0
release_per_sample = 0.0000175
lookahead_samples = int(sample_rate * lookahead_ms / 1000.0)
attack_samples = int(sample_rate * attack_ms / 1000.0)
padded_audio = np.vstack((mixed_stereo, np.zeros((lookahead_samples, 2))))
output_audio = np.zeros_like(mixed_stereo)
current_scale = 1.0
attack = np.ones(attack_samples)
attack_index = attack_samples
target_scale = 1.0
for i in range(len(mixed_stereo)):
lookahead_window = padded_audio[i:i + lookahead_samples]
max_sample = np.max(np.abs(lookahead_window))
required_scale = 1.0 / max_sample if max_sample > 1.0 else 1.0
safe_peak = max(max_sample, 0.0001)
release_rate = release_per_sample / safe_peak if max_sample < 1.0 else release_per_sample
# Attack
if required_scale < target_scale:
target_scale = required_scale
attack_start = current_scale
attack_diff = target_scale - attack_start
attack = attack_start + np.linspace(0, attack_diff, attack_samples)
attack_index = 0
if attack_index < attack_samples:
current_scale = attack[attack_index]
attack_index += 1
# Release
elif current_scale < 1.0:
current_scale += release_rate
if current_scale > 1.0:
current_scale = 1.0
target_scale = current_scale
output_audio[i] = mixed_stereo[i] * current_scale
output_audio = np.clip(output_audio, -1.0, 1.0)
pcm_int16 = (output_audio * 32767).astype(np.int16)
# Save WAV
wav_filename = "Final Limiter.wav"
with wave.open(wav_filename, 'wb') as wav_file:
wav_file.setnchannels(2)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
wav_file.writeframes(pcm_int16.tobytes())
@Fdelapena I tested my Python limiter. If I play the same sine more than four times, the difference with RPG_RT is that release starts more quiet and takes longer (expected) because there's no channel limit.
I found a very bad bug in RPG_RT though: If I copy and rename a file to use different channels, release breaks if the max sample is louder than 9.0 (+19 dB). It keeps making them louder forever (until you need to see a doctor...). My Python test doesn't do that of course, I'm not crazy lol
also worth checking: #3045 (could be the same clipping problem)
I tested Muma Rope (32.2 and 33.3). It's this issue, but it was "fixed" with a workaround at some point: In 33.3, there's a check to detect EasyRPG. If it's detected, Black Waters plays a pre-rendered BGM version instead of the SE. There's a similar workaround to fix Generate Dungeon too
What kind of check do they use to detect us?
And unfortunately I had no time in general to do any EasyRPG work recently. So no progress with this issue here
What kind of check do they use to detect us?
The game sends the player to Map0216, which is both in the lmu and emu format in the files of the game, and depending on which one gets visited when the player is sent there, the engine is detected differently.
I think my Python limiter does this:
const int lookahead_samples = frequency * 5 / 1000; // 5 ms
const int attack_samples = frequency * 2 / 1000; // 2 ms
constexpr float release_per_sample = 0.0000175f;
std::vector<float> attack_curve(attack_samples, 1.0f);
int attack_index = attack_samples;
float current_scale = 1.0f;
float target_scale = 1.0f;
std::vector<float> padded_buffer = mixer_buffer;
padded_buffer.resize(mixer_buffer.size() + lookahead_samples * 2, 0.0f);
for (int i = 0; i < samples_per_frame; ++i) {
float max_sample = 0.0f;
for (int j = 0; j < lookahead_samples; ++j) {
int idx = (i + j) * 2;
float abs_l = std::fabs(padded_buffer[idx]);
float abs_r = std::fabs(padded_buffer[idx + 1]);
max_sample = std::max({max_sample, abs_l, abs_r});
}
float required_scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
float safe_peak = std::max(max_sample, 0.0001f);
float release_rate = (max_sample < 1.0f) ? (release_per_sample / safe_peak) : release_per_sample;
// Attack
if (required_scale < target_scale) {
target_scale = required_scale;
float att_start = current_scale;
float att_diff = target_scale - att_start;
for (int k = 0; k < attack_samples; ++k) {
attack_curve[k] = att_start + (att_diff * k) / attack_samples;
}
attack_index = 0;
}
if (attack_index < attack_samples) {
current_scale = attack_curve[attack_index++];
}
// Release
else if (current_scale < 1.0f) {
current_scale += release_rate;
if (current_scale > 1.0f) current_scale = 1.0f;
target_scale = current_scale;
}
float sample_l = std::clamp(mixer_buffer[i * 2] * current_scale, -1.0f, 1.0f);
float sample_r = std::clamp(mixer_buffer[i * 2 + 1] * current_scale, -1.0f, 1.0f);
sample_buffer[i * 2] = static_cast<int16_t>(sample_l * 32767.0f);
sample_buffer[i * 2 + 1] = static_cast<int16_t>(sample_r * 32767.0f);
}
@Lt-knb I will finally take a quick look at this. Sorry for the delay. Is the code you provided in the last comment what I'm supposed to add to the Player?
Or this one here:
cpp
float max_sample = 0.0f;
unsigned num_samples = samples_per_frame * 2;
for (unsigned i = 0; i < num_samples; i++) {
float abs_sample = std::abs(mixer_buffer[i]);
if (abs_sample > max_sample) max_sample = abs_sample;
}
float scale = (max_sample > 1.0f) ? (1.0f / max_sample) : 1.0f;
for (unsigned i = 0; i < num_samples; i++) {
float sample = mixer_buffer[i] * scale;
sample_buffer[i] = static_cast<int16_t>(sample * 32768.0f);
}
Is the code you provided in the last comment what I'm supposed to add to the Player?
Yes, last comment. It's my attempt at a "translation" of my Python limiter
There is some glitching when using your code.
Likely a problem with the Python -> C++ transformation... will try to figure this out.
https://github.com/user-attachments/assets/5be89c6c-5ba8-4bc7-8850-4fb84b1152c7
Correction. The code must be optimized. It does not glitch in release mode.
The slow code is:
for (int j = 0; j < lookahead_samples; ++j) {
int idx = (i + j) * 2;
float abs_l = std::fabs(padded_buffer[idx]);
float abs_r = std::fabs(padded_buffer[idx + 1]);
max_sample = std::max({max_sample, abs_l, abs_r});
}
when I comment this out the glitching is gone.
I tested the PR. I don't have that crackling you get 🤔 My CPU usage is 0.6%. I have a Ryzen 7 7700 though (and I tested Player 64 bit).
The limiter is forgetting all variables when the buffer ends right now (2048 samples), so it resets every 46 ms, causing release to be truncated, and attack to be triggered again. We need to make the variables persistent.
Edit: Tested 32 bit, still not getting your crackling
Let's ignore any performance discussion for now. Correctness comes first.
Can you tell me which variables (names of them) must survive across buffer fills? I just have no idea of audio so I don't really know what is going on in the code. 😅
These: attack_index current_scale target_scale
Tested. Now release is no longer truncated, and attack doesn't trigger when it shouldn't.
There's an issue with the lookahead, but I believe this is more important right now: Sound timing is wrong in EasyRPG, so my sine test (which uses Wait) has asymmetry, the samples don't align. That makes waveform comparisons hard when I test the limiter.
In RPG_RT, Play SE plays after exactly 1 second when Wait is 1 s, so my sine test has perfect symmetry there. I also tried this: I opened Equipment (in Test Game 2000) and held the Down key. The distance between the Cursor sounds was alternating between 60 and 70 ms. So it seems to imply a buffer is just 10 ms (441 samples). This is #2861.
To match RPG_RT, samples_per_frame would have to be 441 instead of 2048. But honestly, there's no way the buffer length in DirectSound is set to such a low number without consequences in RPG_RT, it's not feasible, not even today. So I don't think sound timing is tied to the buffer size, it should be its own thing
Wait a second... Remember the +19 dB issue in RPG_RT, that causes absolute eardrum carnage? When that happened, Audacity was aware of the samples' real loudness, and it let me normalize to 0 dB, so it was 32 bit float. I could see and play the full waveform, unaffected by the -1.0, 1.0 ceiling. So not only the limiter broke in RPG_RT, Windows' limiter (CAudioLimiter) failed too 👀 That means... There's actually no limiter in RPG_RT (nor DirectSound), it's just relying on Windows'. So I've been trying to replicate Windows' own limiter without realizing >.>
I played my sines in Foobar and Audacity simultaneously to make sure, and that limiter kicked in.
As long as we output as 16 bit, it can't do its job, because we're doing the mixing ourselves, so it's too late. But if we output as Float 32 instead (and remove my limiter... Ugh), then that limiter will work