SSRC icon indicating copy to clipboard operation
SSRC copied to clipboard

2.4.0: --insane output length wrong (silence added at both ends)

Open gee-ell opened this issue 2 months ago • 31 comments

(first time I've tested 2.x)

windows 2.4.0 binaries: input: 96k / 64bit float

ssrc.exe --rate 48000 --bits 24 --dither 98 --pdf 1 --profile insane %1 "%outfile%"

= output is longer than the input, silence is added at both ends. example:

input: 1m 25s 68ms

output: 1m 27s 799ms

gee-ell avatar Dec 05 '25 19:12 gee-ell

only seems to affect 'insane' profile, 'high' & 'long' give correct length output.

gee-ell avatar Dec 05 '25 20:12 gee-ell

This is not specific to the insane profile; silent intervals are added in all profiles. This is by design. They are added because FIR filters are applied during the resampling process. The insane profile is, literally, an insane profile, so various effects manifest in extreme forms.

shibatch avatar Dec 06 '25 03:12 shibatch

OK but this isn't very useful, because the output then has to be trimmed back manually to the original length in most scenarios - it's better if the output is trimmed to the original length by default.

If you feel it's important to have 'correct' output padding from the FIR filters, this would be better as an optional switch.

gee-ell avatar Dec 07 '25 13:12 gee-ell

The question becomes how to define the correct output. If you cut off the beginning and end of the output audio file, those parts will have incorrect characteristics.

shibatch avatar Dec 07 '25 13:12 shibatch

right, but in any meaningful way? surely most or all of it is below the noise floor, or close to it. I'd be very happy trimming back to the original file, intuitively that's what you expect from a tool like this.

gee-ell avatar Dec 07 '25 14:12 gee-ell

What I'd prefer is not to trim it, but to add it to the previous audio segment. Since you can get the delay, that should be possible.

shibatch avatar Dec 07 '25 14:12 shibatch

I'm not following you. maybe we're talking about different things.

I'm talking about the overall audio file, which currently has (effectively) silence added to the beginning and end, making it longer. intuitively, you expect the track length in comes out identically. that's also crucial for mastering, where people will not expect the already-decided metadata (ie. length in this case) to change just because the file is resampled.

in 'insane' profile this is more pronounced yes, but there it actually adds 1.3s of silence at the beginning, and at the end to my 1m 25s test file. that silence can be trimmed with no consequence.

gee-ell avatar Dec 07 '25 14:12 gee-ell

Image

gee-ell avatar Dec 07 '25 14:12 gee-ell

You might not notice it by ear, but a spectrum analyzer clearly shows distortion. Theoretically, each PCM sample has infinite length. Correctly converting the sample rate theoretically means generating infinitely long PCM data. The question is how long to truncate it. With the insane profile, this becomes quite long.

shibatch avatar Dec 07 '25 14:12 shibatch

yes, but FIRs are already truncated because we care about what is practical, and almost all of the infinity is completely inaudible to us. In practice, my test file has unwanted silence at each end that I will trim away, and it will sound completely fine.

gee-ell avatar Dec 07 '25 14:12 gee-ell

So it's just not audible to your ears. It's truncating the FIR, but the length of the filter and the delay correspond. The insane profile uses a long FIR filter, so the delay also increases.

shibatch avatar Dec 07 '25 14:12 shibatch

don't forget, the long FIR length does impact the existing audio content positively even after the output is trimmed, but it makes no difference to silence (or near silence) in any audible way.

gee-ell avatar Dec 07 '25 14:12 gee-ell

Try outputting in a floating-point format. That silent segment should not be true silence.

shibatch avatar Dec 07 '25 14:12 shibatch

So it's just not audible to your ears. It's truncating the FIR, but the length of the filter and the delay correspond. The insane profile uses a long FIR filter, so the delay also increases.

it's not about my ears. look at any resampler, they don't change the length of the audio file. in the 'insane' example, what does 1.3s of (effective) silence buy me? it's not about mathematical perfection, it's about what is practically useful.

gee-ell avatar Dec 07 '25 14:12 gee-ell

When we adopt a stance that doesn't demand mathematical rigor, the problem arises that defining correct behavior becomes difficult. If we decide to truncate at the ends, we then need to define a separate truncation method just for those parts. I don't know how other resamplers handle it, but isn't it simply that the filters are shorter?

shibatch avatar Dec 07 '25 14:12 shibatch

I wanted to confirm what I'm saying, so I asked ChatGPT to verify it - the summary:

"Why they trim

Because every linear-phase FIR has a group delay of N/2 samples. If you “preserve the entire FIR output,” you would:

shift the audio later by half the filter length

add a symmetric tail afterward

break time alignment with video, stems, MIDI, and other tracks

produce files longer than their content

Nobody want this. So everyone simply subtracts group delay and discards the tiny, inaudible residual ringing.

This is standard DSP practice."

gee-ell avatar Dec 07 '25 14:12 gee-ell

So I want them to be added together, not discarded.

Ringing is something that can sound that way in certain situations, but that's a difference between the theory and how it is perceived. SSRC is simply a method faithful to the theory, and how theory and perception actually differ is a separate issue.

shibatch avatar Dec 07 '25 14:12 shibatch

I'm not sure what you mean by 'added together'? the pre-ringing at the start of the file, and post-ringing at the end simply aren't relevant in practice, in terms of what is audible.

however, I do understand the purist argument. but I would say, skip writing the first and last N/2 output samples by default, and provide a switch for anyone who must have the correct pre and post-ringing tails if they want them. most people who want to resample audio don't expect the runtime to change, and I'm certain you cannot hear any difference.

gee-ell avatar Dec 07 '25 14:12 gee-ell

I completely understand that editing the output of the resampler is a hassle, but defining a method to “correctly” truncate only the edges isn't exactly straightforward either. It requires various evaluations, and demanding that from a completely free project is a bit much.

I wasn't particularly good at hearing when I was young, but testing it now shows it's definitely worse than it was back then.

shibatch avatar Dec 07 '25 15:12 shibatch

yes all our ears are decaying slowly :).

but it really is that simple - don't write the first N/2 samples of the output to the file, and the last N/2 samples. basically you're just cutting the pre and post ringing tails off completely. that's what everyone else does, it doesn't need to be more complex than that.

gee-ell avatar Dec 07 '25 15:12 gee-ell

It's a concept that's often misunderstood, but the most important aspect of mathematical elegance is that it makes defining software behavior straightforward. Introducing subjective elements suddenly adds all sorts of complications.

Simply cutting off the edges causes those specific parts to behave abnormally, making verification difficult.

shibatch avatar Dec 07 '25 15:12 shibatch

OK, but this is all theory. please try it in practice, and tell me if you can hear any difference. I guarantee you that's what every mastering engineer will do in practice, even at the highest end nobody wants extra (effective) silence added.

If you can genuinely discern a difference then great. I'm sure you can't though.

gee-ell avatar Dec 07 '25 15:12 gee-ell

I basically don't test by listening with my ears. Whether the characteristics are correct is automatically checked using software designed for that purpose.

To be honest, even if the sound quality is pretty bad, I can't really tell with my ears.

shibatch avatar Dec 07 '25 15:12 shibatch

Ultimately, it's all about how easy it is to test.

shibatch avatar Dec 07 '25 15:12 shibatch

that's fair.

but think about it this way - the pre-and post ringing tails added to the output only affect the added silence from the process. all the audible content of the input already had the benefit of the large FIR.

so the added tails are mathematically correct, but audibly silent. the audible content is not damaged by trimming the tails in any way you can notice. That's why everyone else just trims them too.

gee-ell avatar Dec 07 '25 15:12 gee-ell

So whether it's audible or not is a matter of how it's perceived by the ear.

I'm worried that if I completely cut off the beginning and end, people might say noise gets in when you connect them.

Then if we start talking about applying a window function there, I'll have to test that behavior too.

shibatch avatar Dec 07 '25 15:12 shibatch

After all, it's a free project. I'm a street performer in the digital world. No matter how much the audience says it's an easy song, I won't play it if I'm not in the mood.

shibatch avatar Dec 07 '25 15:12 shibatch

sure, it's your baby. in that case I'll add this myself to the code, there really is no downside and it would be easy to add a switch for it.

Also don't forget it's not a matter of opinion, you can measure if the 'silence' added is below a certain threshold.

gee-ell avatar Dec 09 '25 16:12 gee-ell

Furthermore, unlike the previous version, this version can be used as a library, so I'd prefer that such output processing be handled within the application.

shibatch avatar Dec 09 '25 16:12 shibatch

sure, that's why I'm trying to add it to the CLI application using '--trim'. just trying to make sense of the code ...

gee-ell avatar Dec 09 '25 17:12 gee-ell