2.4.0: --insane output length wrong (silence added at both ends)
(first time I've tested 2.x)
windows 2.4.0 binaries: input: 96k / 64bit float
ssrc.exe --rate 48000 --bits 24 --dither 98 --pdf 1 --profile insane %1 "%outfile%"
= output is longer than the input, silence is added at both ends. example:
input: 1m 25s 68ms
output: 1m 27s 799ms
only seems to affect 'insane' profile, 'high' & 'long' give correct length output.
This is not specific to the insane profile; silent intervals are added in all profiles. This is by design. They are added because FIR filters are applied during the resampling process. The insane profile is, literally, an insane profile, so various effects manifest in extreme forms.
OK but this isn't very useful, because the output then has to be trimmed back manually to the original length in most scenarios - it's better if the output is trimmed to the original length by default.
If you feel it's important to have 'correct' output padding from the FIR filters, this would be better as an optional switch.
The question becomes how to define the correct output. If you cut off the beginning and end of the output audio file, those parts will have incorrect characteristics.
right, but in any meaningful way? surely most or all of it is below the noise floor, or close to it. I'd be very happy trimming back to the original file, intuitively that's what you expect from a tool like this.
What I'd prefer is not to trim it, but to add it to the previous audio segment. Since you can get the delay, that should be possible.
I'm not following you. maybe we're talking about different things.
I'm talking about the overall audio file, which currently has (effectively) silence added to the beginning and end, making it longer. intuitively, you expect the track length in comes out identically. that's also crucial for mastering, where people will not expect the already-decided metadata (ie. length in this case) to change just because the file is resampled.
in 'insane' profile this is more pronounced yes, but there it actually adds 1.3s of silence at the beginning, and at the end to my 1m 25s test file. that silence can be trimmed with no consequence.
You might not notice it by ear, but a spectrum analyzer clearly shows distortion. Theoretically, each PCM sample has infinite length. Correctly converting the sample rate theoretically means generating infinitely long PCM data. The question is how long to truncate it. With the insane profile, this becomes quite long.
yes, but FIRs are already truncated because we care about what is practical, and almost all of the infinity is completely inaudible to us. In practice, my test file has unwanted silence at each end that I will trim away, and it will sound completely fine.
So it's just not audible to your ears. It's truncating the FIR, but the length of the filter and the delay correspond. The insane profile uses a long FIR filter, so the delay also increases.
don't forget, the long FIR length does impact the existing audio content positively even after the output is trimmed, but it makes no difference to silence (or near silence) in any audible way.
Try outputting in a floating-point format. That silent segment should not be true silence.
So it's just not audible to your ears. It's truncating the FIR, but the length of the filter and the delay correspond. The insane profile uses a long FIR filter, so the delay also increases.
it's not about my ears. look at any resampler, they don't change the length of the audio file. in the 'insane' example, what does 1.3s of (effective) silence buy me? it's not about mathematical perfection, it's about what is practically useful.
When we adopt a stance that doesn't demand mathematical rigor, the problem arises that defining correct behavior becomes difficult. If we decide to truncate at the ends, we then need to define a separate truncation method just for those parts. I don't know how other resamplers handle it, but isn't it simply that the filters are shorter?
I wanted to confirm what I'm saying, so I asked ChatGPT to verify it - the summary:
"Why they trim
Because every linear-phase FIR has a group delay of N/2 samples. If you “preserve the entire FIR output,” you would:
shift the audio later by half the filter length
add a symmetric tail afterward
break time alignment with video, stems, MIDI, and other tracks
produce files longer than their content
Nobody want this. So everyone simply subtracts group delay and discards the tiny, inaudible residual ringing.
This is standard DSP practice."
So I want them to be added together, not discarded.
Ringing is something that can sound that way in certain situations, but that's a difference between the theory and how it is perceived. SSRC is simply a method faithful to the theory, and how theory and perception actually differ is a separate issue.
I'm not sure what you mean by 'added together'? the pre-ringing at the start of the file, and post-ringing at the end simply aren't relevant in practice, in terms of what is audible.
however, I do understand the purist argument. but I would say, skip writing the first and last N/2 output samples by default, and provide a switch for anyone who must have the correct pre and post-ringing tails if they want them. most people who want to resample audio don't expect the runtime to change, and I'm certain you cannot hear any difference.
I completely understand that editing the output of the resampler is a hassle, but defining a method to “correctly” truncate only the edges isn't exactly straightforward either. It requires various evaluations, and demanding that from a completely free project is a bit much.
I wasn't particularly good at hearing when I was young, but testing it now shows it's definitely worse than it was back then.
yes all our ears are decaying slowly :).
but it really is that simple - don't write the first N/2 samples of the output to the file, and the last N/2 samples. basically you're just cutting the pre and post ringing tails off completely. that's what everyone else does, it doesn't need to be more complex than that.
It's a concept that's often misunderstood, but the most important aspect of mathematical elegance is that it makes defining software behavior straightforward. Introducing subjective elements suddenly adds all sorts of complications.
Simply cutting off the edges causes those specific parts to behave abnormally, making verification difficult.
OK, but this is all theory. please try it in practice, and tell me if you can hear any difference. I guarantee you that's what every mastering engineer will do in practice, even at the highest end nobody wants extra (effective) silence added.
If you can genuinely discern a difference then great. I'm sure you can't though.
I basically don't test by listening with my ears. Whether the characteristics are correct is automatically checked using software designed for that purpose.
To be honest, even if the sound quality is pretty bad, I can't really tell with my ears.
Ultimately, it's all about how easy it is to test.
that's fair.
but think about it this way - the pre-and post ringing tails added to the output only affect the added silence from the process. all the audible content of the input already had the benefit of the large FIR.
so the added tails are mathematically correct, but audibly silent. the audible content is not damaged by trimming the tails in any way you can notice. That's why everyone else just trims them too.
So whether it's audible or not is a matter of how it's perceived by the ear.
I'm worried that if I completely cut off the beginning and end, people might say noise gets in when you connect them.
Then if we start talking about applying a window function there, I'll have to test that behavior too.
After all, it's a free project. I'm a street performer in the digital world. No matter how much the audience says it's an easy song, I won't play it if I'm not in the mood.
sure, it's your baby. in that case I'll add this myself to the code, there really is no downside and it would be easy to add a switch for it.
Also don't forget it's not a matter of opinion, you can measure if the 'silence' added is below a certain threshold.
Furthermore, unlike the previous version, this version can be used as a library, so I'd prefer that such output processing be handled within the application.
sure, that's why I'm trying to add it to the CLI application using '--trim'. just trying to make sense of the code ...