librespot icon indicating copy to clipboard operation
librespot copied to clipboard

Add Resampling Support

Open JasonLG1979 opened this issue 2 years ago • 50 comments

This PR adds resampling to 48kHz, 88.2kHz, and 96kHz with Windowed Sinc Interpolation.

It also moves everything except decoding out of player and a few other misc improvements like improving thread creation in player, using #[default] in the player config enums where we can, and includes the sample pipeline latency in calculating an accurate position.

JasonLG1979 avatar Jun 22 '23 05:06 JasonLG1979

@eladyn or @roderickvd or anyone else really, this could use a review. I know it's a lot of code, I tried my best to split it up into a few commits to make it comprehensible.

JasonLG1979 avatar Jun 22 '23 20:06 JasonLG1979

Appreciate the work! I don’t have a lot of time and as you know I don’t use this anymore myself, so I hope others will take a look before I get to it.

roderickvd avatar Jun 23 '23 11:06 roderickvd

@roderickvd, who else is active and has commit rights?

JasonLG1979 avatar Jun 23 '23 13:06 JasonLG1979

For testing purposes the new options are:

        --interpolation-quality QUALITY
                        Interpolation Quality to use if Resampling
                        {Low|Medium|High}. Defaults to Low.
        --sample-rate SAMPLERATE
                        Sample Rate to Resample to
                        {44.1kHz|48kHz|88.2kHz|96kHz}. Defaults to 44.1kHz
                        meaning no resampling.

Where --interpolation-quality Low uses Linear Interpolation, which is fast but not great as far as quality, and Medium|High use Windowed Sinc Interpolation which is much better then Linear but it cost more CPU cycles.

A sample rate of 44.1kHz (the default) bypasses resampling.

JasonLG1979 avatar Jun 23 '23 17:06 JasonLG1979

I finally found a bit of time to have a look at this, but I'll probably need a bit more to thoroughly go through all files (especially things like alsa or the resampler. I hope, I will get to it this weekend, but not sure yet.

I commented on some minor things that I noticed, but these are mainly technical things. On the audio side I'm not that experienced so I won't have much to add, but I'll give the various new options a try, when I get to it.

ALSA is my baby. I wouldn't do her wrong,lol!!!

The resampler is implemented to basically mimic a simple FIR filter in operation for the Windowed Sinc version, in that it uses temporal convolution to interpolate. It's non-oversampling to keep the resource usage to a minimum and dual threaded so that right and left channels are processed at the same time.

Good-enough quality and efficiency was the target. Are there better quality resamplers? Sure. Are there other resamplers that are this balanced? I'm not sure. You can resample to 96kHz at the highest setting on a Pi Zero with this reampler and use less than 20-ish % system resources.

JasonLG1979 avatar Jun 26 '23 23:06 JasonLG1979

As far as latency is concerned by my math, depending on sample rate and interpolation quality, it maxes out at any where from 1.5ms to 8.4 ms. But in normal operation the latency is variable because it tries to process the largest chunk of samples it can.

JasonLG1979 avatar Jun 27 '23 00:06 JasonLG1979

Here is a screenshot of htop resampling to 48kHz High on a Pi Zero v2.

What you have to keep in mind is the CPU %'s next to the threads represent % of 1 core.

The Avg bar is the actual total system load.

As you can see, headroom for days.

Screenshot from 2023-06-26 19-27-30

Forgot to mention, aside from installing a Raspotify package based on this branch it's a fresh install of headless Raspberry Pi OS with no tweaks.

JasonLG1979 avatar Jun 27 '23 00:06 JasonLG1979

I'm still getting:

[2023-06-27T02:41:33Z ERROR librespot_playback::player] Unable to load audio item: Error { kind: Unavailable, error: Response(MercuryResponse { uri: "hm://keymaster/token/authenticated?scope=playlist-read&client_id=65b708073fc0480ea92a077233ca87bd&device_id=179116ac794f5870d6c936120d87ce0370052b84", status_code: 403, payload: [[123, 34, 99, 111, 100, 101, 34, 58, 52, 44, 34, 101, 114, 114, 111, 114, 68, 101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 34, 58, 34, 73, 110, 118, 97, 108, 105, 100, 32, 114, 101, 113, 117, 101, 115, 116, 34, 125]] }) }

When I try to use discovery. Librespot works fine if I provide my creds. I wouldn't mind fixing this also as it's the blocker that prevents me from switching to dev in Raspotify. Otherwise dev works rather well.

JasonLG1979 avatar Jun 27 '23 02:06 JasonLG1979

The portaudio, jackaudio and rodio backends need tested and probably fixed. I'm not really at all familiar with them. The ALSA and PulseAudio backends are solid. Gstreamer was super straightforward as far as a non-const sample rate and for pipe and subprocess sample rate doesn't really matter at all.

When introducing common functionality I recommend to make sure that it works with all backends. Fortunately PortAudio and Rodio are really easy to test. By the way, Rodio does its own interpolation so it should not even care.

Jack is also not so difficult to test when you find one of those binaries that come complete with a GUI to quickly launch a server and graphically construct a pipeline. Like QjackCtl.

I'm not sure if librespot should have resampling, but since I don't use it anymore, let's follow the voice of the people. And at this point you're the only two people responding, so 😆

I'm still getting:

[2023-06-27T02:41:33Z ERROR librespot_playback::player] Unable to load audio item: Error { kind: Unavailable, error: Response(MercuryResponse { uri: "hm://keymaster/token/authenticated?scope=playlist-read&client_id=65b708073fc0480ea92a077233ca87bd&device_id=179116ac794f5870d6c936120d87ce0370052b84", status_code: 403, payload: [[123, 34, 99, 111, 100, 101, 34, 58, 52, 44, 34, 101, 114, 114, 111, 114, 68, 101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 34, 58, 34, 73, 110, 118, 97, 108, 105, 100, 32, 114, 101, 113, 117, 101, 115, 116, 34, 125]] }) }

When I try to use discovery. Librespot works fine if I provide my creds. I wouldn't mind fixing this also as it's the blocker that prevents me from switching to dev in Raspotify. Otherwise dev works rather well.

You can read the error message by converting the payload array from decimal to ASCII. It then reads:

{"code":4,"errorDescription":"Invalid request"}

As I commented with the issues, see if it helps if you simplify that keymaster request by taking out some of the parameters like client_id and/or device_id. Maybe when you use discovery, either of those IDs are not properly initialised yet.

roderickvd avatar Jun 28 '23 20:06 roderickvd

When introducing common functionality I recommend to make sure that it works with all backends. Fortunately PortAudio and Rodio are really easy to test. By the way, Rodio does its own interpolation so it should not even care.

Everything should work now.

Jack is also not so difficult to test when you find one of those binaries that come complete with a GUI to quickly launch a server and graphically construct a pipeline. Like QjackCtl.

Jack conflicts with pipewire. It's basically impossible to test without completely borking the audio on your system. But it doesn't take a sample rate in it's initialization anyway so I assumed it's up to the user to make sure the sample rates jive?

I'm not sure if librespot should have resampling, but since I don't use it anymore, let's follow the voice of the people. And at this point you're the only two people responding, so laughing

It's a pretty common request as basically users expect all audio players these days to be able to do at least some sort of basic resampling.

JasonLG1979 avatar Jun 28 '23 20:06 JasonLG1979

You can read the error message by converting the payload array from decimal to ASCII. It then reads:

{"code":4,"errorDescription":"Invalid request"}

That is a super useless error message. Way to go Spotify,lol!!!

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

JasonLG1979 avatar Jun 29 '23 04:06 JasonLG1979

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

Sure do it. One tip, forget about trying to deserialize the JSON. I tried once but there are so many variants that Spotify puts out, that I basically reverted many hours of work simply because I was running into walls everywhere. Maybe I got it totally wrong then but it was a pain in the butt.

roderickvd avatar Jun 30 '23 18:06 roderickvd

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

Sure do it. One tip, forget about trying to deserialize the JSON. I tried once but there are so many variants that Spotify puts out, that I basically reverted many hours of work simply because I was running into walls everywhere. Maybe I got it totally wrong then but it was a pain in the butt.

I was more thinking just run converting the response to ASCII when it's an error so we get an error message instead of an array of numbers. The error messages don't seem to be that useful so far but at least if they're converted they won't fill up my terminal.

JasonLG1979 avatar Jun 30 '23 19:06 JasonLG1979

Ok @eladyn and @roderickvd, hit me with another round or reviews.

JasonLG1979 avatar Jun 30 '23 23:06 JasonLG1979

Does your resampler have an anti-aliasing filter?

roderickvd avatar Jul 01 '23 13:07 roderickvd

Does your resampler have an anti-aliasing filter?

It only upsamples and it doesn't oversample. No new high freq content is created, and everything from the source fits into the target. Anti-aliasing is not needed. And in the case of windowed sinc interpolation the sinc window is a natural anti-aliasing filter anyway.

That's one of those "givens" I was talking about. I don't have to deal with arbitrary input. I know that my input is 44.1kHz. I know I will only be upsampling.

Oversampling could potentially improve the quality of the interpolation at the cost of increased code complexity and A LOT more CPU usage. And in that case an additional anti-aliasing step would be required for linear interpolation but still not for windowed sinc though as I mentioned the sic window is a natural anti-aliasing filter.

Being non-oversampling was a design choice. My goal was good enough quality and low-ish CPU usage. I was looking for the best "bang for the buck".

JasonLG1979 avatar Jul 01 '23 15:07 JasonLG1979

I played with a version that incorporated oversampling but to my non-golden ears it made no difference besides eating twice as much CPU.

JasonLG1979 avatar Jul 01 '23 15:07 JasonLG1979

The 1st rule of resampling is; don't unless you have to. Under no circumstances will it improve sound quality over the original. There are only 2 real reasons to resample.

  1. Your device doesn't support 44.1kHz.
  2. You're doing some DSP that requires it.

This resampler is designed for case 1. For case 2 the user should let the DSP handle resampling since it should know best how to handle it. For example, my advice for a PulseAudio Sink user would be to set the format to F32 and let PulseAudio handle resampling.

This resampler best serves backends that either don't have resampling or that have resampling that is too CPU intensive and thus can be bypassed.

JasonLG1979 avatar Jul 01 '23 17:07 JasonLG1979

For users of the ALSA backend for example this resampler will absolutely smoke libsamplerate. I can't objectively measure the sound quality difference but subjectively I can't hear a difference.

As far as resource utilization is concerned it's no contest. libsamplerate is single threaded and horribly inefficient. And the way it's designed it would take a rewrite to make it any better in that regard. It will cripple a lower power device. (I did a deep dive in the source code when writing this resampler.) It for example calculates the interpolation coefficients on the fly.

JasonLG1979 avatar Jul 01 '23 18:07 JasonLG1979

I have not studied your resampling algorithm. Could you clarify a few points?

No new high freq content is created, and everything from the source fits into the target. Anti-aliasing is not needed.

Are you sure? Fitting 44.1 kHz into 192 kHz with just the DAC filtering at around 96 kHz will allow all 22.05 - 44.1 kHz mirror images through.

And in the case of windowed sinc interpolation the sinc window is a natural anti-aliasing filter anyway.

I agree that a windowed sinc function should take care of most of it. Couple of questions:

  • Because you say "in the case of..." are there any other cases that do not use a windowed sinc function?
  • What is the cut-off frequency? Is it hardcoded or does it depend on the input (nowadays only 44.1 kHz, just asking)
  • What is the kernel length? (to assess roll-off, attenuation and CPU load relative to libresample)

roderickvd avatar Jul 02 '23 19:07 roderickvd

Are you sure? Fitting 44.1 kHz into 192 kHz with just the DAC filtering at around 96 kHz will allow all 22.05 - 44.1 kHz mirror images through.

Working under the assumption that the source audio is anti-aliased properly there is nothing above 22.05 in the resampled output. Like I said no new high freq content is created. If the source is not properly anti-aliased then we should be FIR filtering ALL SAMPLE RATES even 44.1kHz.

I agree that a windowed sinc function should take care of most of it. Couple of questions:

Because you say "in the case of..." are there any other cases that do not use a windowed sinc function?

Linear Interpolation doesn't use a windowed sinc function.

What is the cut-off frequency? Is it hardcoded or does it depend on the input (nowadays only 44.1 kHz, just asking)

The sinc window cuts off at the Nyquist freq of the output sample rate naturally.

Since the output sample rate is higher then the input sample rate it doesn't cut off any of the input.

What is the kernel length? (to assess roll-off, attenuation and CPU load relative to libresample)

High is 257 Med is 129 Low is 0 (it doesn't use a Kernel, It's just plain old Linear).

There is no roll-off. There is no explicit cut off freq. To do that I would have to convolute FIR Filter coefficients into the interpolation coefficients. It's essentially a brick-wall at the Nyquist freq of the output sample rate. It's either there or it's not. But again it doesn't matter since even in the resample output there is nothing above 22.05.

Since no oversampling is done, no additional high freq content is created. If it were to oversimple then filtering would be required because oversampling shifts the Nyquist freq of the source before Interpolation.

JasonLG1979 avatar Jul 02 '23 20:07 JasonLG1979

If it helps you sleep at night I can add a FIR filter that filters out everything about 22.05.

JasonLG1979 avatar Jul 02 '23 20:07 JasonLG1979

Should be fine then, thanks. You are right that the input should contain nothing above Fs/2.

Maybe talking out of my butt when I ask ... what would happen if you set the center frequency to 22.05 kHz just to make sure? Would that make any sense or even work? Or more fundamentally, why set Fc = Fs and not Fs/2?

You see I'm not a resampling specialist, just heard a thing or two 😄

Edit: I think windowed sinc rolls off by definition, depending on the number of coefficients in the kernel?

roderickvd avatar Jul 02 '23 20:07 roderickvd

It wouldn't hurt to add a FIR filter in to the interpolation coefficients just in case I am wrong. Calculating the interpolation coefficients is a one time cost so mixing in a FIR filter doesn't cost anything at runtime.

Edit: I think windowed sinc rolls off by definition, depending on the number of coefficients in the kernel?

Assuming you add FIR filter coefficients. Otherwise it's a brickwall. Well no matter what it's a brickwall you can just make it roll-off before it hits the brickwall if you want.

JasonLG1979 avatar Jul 02 '23 20:07 JasonLG1979

Traditionally you'd roll-off at slightly below the Nyquist freq of the input so for 44.1kHz 20kHz.

JasonLG1979 avatar Jul 02 '23 20:07 JasonLG1979

I've the best luck with a FIR filter with 3 taps using the Blackman-Harris window. (more taps steeper roll-off but more distortion with tight transition bandwidths like 48kHz where the transition bandwidth is only 4000Hz)

JasonLG1979 avatar Jul 02 '23 20:07 JasonLG1979

I already wrote a stand-alone FIR Filter:

https://gist.github.com/JasonLG1979/c95b035ed271bfcbbe10a8047cf9e580

I'll just adapt it for our needs.

JasonLG1979 avatar Jul 02 '23 21:07 JasonLG1979

@roderickvd, There I capped the output bandwidth to 92% which gives us anti-alias filtering without chopping off any of the source bandwidth since even at 48kHz 92% is just over the Nyquist freq of 44.1kHz.

JasonLG1979 avatar Jul 02 '23 23:07 JasonLG1979

Calculate the actual roll-off angle and annotation is some pretty advanced math that I haven't cared to do (LMFAO!!!) but given that it's a Blackman Window the attenuation is going to be at least -58dB and given filter lengths of 129 and 257 the roll-off is going to be steep as Hell.

JasonLG1979 avatar Jul 02 '23 23:07 JasonLG1979

As far as comparing to libsamplerate I want to say they use a window size of 300-something for their highest sinc setting? I found that 257 was about as high as I felt sounded good. (ofc that is subjective) the thing about window size is that it's a trade off between time resolution and freq resolution. Longer is not always better. Longer windows tend to "smear" the sound. You lose transient resolution.

Don't get me wrong libsamplerate is fine. The main reason this resampler is more efficient is that that there are so many known givens. What makes libsamplerate inefficient for the most part besides the single-threaded-ness and calculating coefficients on the fly (which are pretty big hits) is that it can't take anything for granted as a given. It's a matter of being able to optimize for our exact needs.

JasonLG1979 avatar Jul 02 '23 23:07 JasonLG1979