rubato Usage with cpal

Hi,

I’m looking to use Rubato along with CPAL to record audio from a microphone and resample it to 16 kHz mono.

Could you guide me on how to properly handle the resampling? Since the resampler has a fixed buffer size, should I process the audio in chunks?

It would be very helpful if you could provide some directions or an example for this setup.

Thank you!

main.rs

use cpal::traits::{DeviceTrait, HostTrait, StreamTrait};
use cpal::{FromSample, Sample};
use eyre::{bail, Result};
use rubato::{
    Resampler, SincFixedIn, SincInterpolationParameters, SincInterpolationType, WindowFunction,
};
use std::fs::File;
use std::io::BufWriter;
use std::sync::{Arc, Mutex};

fn main() -> Result<()> {
    let host = cpal::default_host();

    // Set up the input device and stream with the default input config.
    let device = host
        .default_input_device()
        .expect("failed to find input device");

    println!("Input device: {}", device.name()?);

    let config = device
        .default_input_config()
        .expect("Failed to get default input config");
    println!("Default input config: {:?}", config);

    // The WAV file we're recording to.
    const PATH: &str = concat!(env!("CARGO_MANIFEST_DIR"), "/recorded.wav");
    let spec = wav_spec_from_config(&config);
    let writer = hound::WavWriter::create(PATH, spec)?;
    let writer = Arc::new(Mutex::new(Some(writer)));

    let params = SincInterpolationParameters {
        sinc_len: 256,
        f_cutoff: 0.95,
        interpolation: SincInterpolationType::Linear,
        oversampling_factor: 256,
        window: WindowFunction::BlackmanHarris2,
    };
    let mut resampler = SincFixedIn::<f64>::new(
        16000 as f64 / config.sample_rate().0 as f64,
        2.0,
        params,
        1024,
        2,
    )
    .unwrap();

    // A flag to indicate that recording is in progress.
    println!("Begin recording...");

    // Run the input stream on a separate thread.
    let writer_2 = writer.clone();

    let err_fn = move |err| {
        eprintln!("an error occurred on stream: {}", err);
    };

    let stream = match config.sample_format() {
        cpal::SampleFormat::I8 => device.build_input_stream(
            &config.into(),
            move |data, _: &_| write_input_data::<i8, i8>(data, &writer_2, &mut resampler),
            err_fn,
            None,
        )?,
        cpal::SampleFormat::I16 => device.build_input_stream(
            &config.into(),
            move |data, _: &_| write_input_data::<i16, i16>(data, &writer_2, &mut resampler),
            err_fn,
            None,
        )?,
        cpal::SampleFormat::I32 => device.build_input_stream(
            &config.into(),
            move |data, _: &_| write_input_data::<i32, i32>(data, &writer_2, &mut resampler),
            err_fn,
            None,
        )?,
        cpal::SampleFormat::F32 => device.build_input_stream(
            &config.into(),
            move |data, _: &_| write_input_data::<f32, f32>(data, &writer_2, &mut resampler),
            err_fn,
            None,
        )?,
        sample_format => {
            bail!("Unsupported sample format '{sample_format}'")
        }
    };

    stream.play()?;

    // Let recording go for roughly three seconds.
    std::thread::sleep(std::time::Duration::from_secs(3));
    drop(stream);
    writer.lock().unwrap().take().unwrap().finalize()?;
    println!("Recording {} complete!", PATH);
    Ok(())
}

fn sample_format(format: cpal::SampleFormat) -> hound::SampleFormat {
    if format.is_float() {
        hound::SampleFormat::Float
    } else {
        hound::SampleFormat::Int
    }
}

fn wav_spec_from_config(config: &cpal::SupportedStreamConfig) -> hound::WavSpec {
    hound::WavSpec {
        channels: config.channels() as _,
        sample_rate: 16000 as _, // Write as 16khz always
        bits_per_sample: (config.sample_format().sample_size() * 8) as _,
        sample_format: sample_format(config.sample_format()),
    }
}

type WavWriterHandle = Arc<Mutex<Option<hound::WavWriter<BufWriter<File>>>>>;

fn write_input_data<T, U>(input: &[T], writer: &WavWriterHandle, resampler: &mut SincFixedIn<f64>)
where
    T: Sample,
    U: Sample + hound::Sample + FromSample<T>,
{
    if let Ok(mut guard) = writer.try_lock() {
        if let Some(writer) = guard.as_mut() {
            for &sample in input.iter() {
                let sample: U = U::from_sample(sample);
                writer.write_sample(sample).ok();
            }
        }
    }
}

Aug 23 '24 15:08 thewh1teagle

Hi, Rubato always processes samples in chunks. You can choose if the input size or the output size should be fixed by using a for example SincFixedIn or SincFixedOut. As far as I know, CPAL always calls the callback with the same number of frames. So when recording from CPAL it would be most convenient to use a resampler with fixed input size. I guess Hound can be used to write as many samples as you want at a time. A tempting way to do things would be to create a fixed input resampler, and call that in your write_input_data method to resample, and then pass the result to Hound. But I would not recommend this, since this puts a lot of work, including disk access, in the callback. You can never know how long that takes to complete, and you should not have anything blocking in the callback. Instead, I would recommend to let the callback just push the raw data into some shared buffer, for example using a Arc<Mutex<VecDequeue<T>>> that only needs the standard library. There are also fancier solutions, for example https://crates.io/crates/ringbuf (that looks promising at a glance, but I have not used it). Then you do the format conversion, resampling and disk writing in another thread (a separate one of the main one). That way, if a disk write gets delayed for whatever reason, capture will continue fine as long as there is space in the shared buffer.

Aug 24 '24 19:08 HEnquist

I'd like to share an example of my implementation. In my case, the project was to resample 48KHz opus audio to 44.1Khz for playback. Even if it's not necessarily 44.1KHz, I resample it to rubato to correspond to the sample rate of the device that I can get from cpal.

I didn't handle the case of mono, but a little Googling tells me that the way to turn stereo audio into mono is simply (left + right)/2, so I think my code could support mono with a few minor changes.

Also, in my case I used process instead of process_into_buffer because I'm not doing the resampling in the cpal stream callback, I'm doing it in a separate rayon thread pool, which requires memory allocation anyway.

I also used FFT because I wanted the resampling to be as fast as possible. Here's the code: https://github.com/NamseEnt/namseent/blob/ed70771614b650508f06182b807375458096ba09/namui/namui/src/system/audio/cpal.rs

I tested it in a windows environment and it worked well without too much hassle.

Sep 11 '24 16:09 namse

Can you please add a more simplified, clear-cut example of this?

The linked code is much more complex than I would want and has a hard-coded chunk size. How do I determine the chunk size, and what is it?

But I also want to convert whatever I record with Cpal and Hound, regardless of sample rate, to 16KHz.

Or maybe examples of working with audio files outside of raw format in general? I don't want to rely on an outside tool for resampling audio, and I hope this crate can help with that, but at the moment, I'm beyond lost looking at the examples and docs.

Sep 13 '24 04:09 gluax

I asked it few weeks ago. Now I can provide useful information:

For resampling:

samplerate crate (simple): https://github.com/thewh1teagle/vad-rs/blob/main/src/helpers.rs#L4

rubato: https://github.com/rustdesk/rustdesk/blob/ab246fdcbf877dc84456af921680b9925cbd3ff1/src/common.rs#L212

As for efficiency: In general you can use bufferring. the term sounds complex but the usage almost identical to Vec in Rust but you pre allocate the memory (size).

Sep 13 '24 04:09 thewh1teagle

Ty so much @thewh1teagle!

Sep 13 '24 04:09 gluax

Can you please add a more simplified, clear-cut example of this?

What is it that is unclear? I will work on improving the documentation, but I need input to know where to best spend the time.

Or maybe examples of working with audio files outside of raw format in general?

This is a resampling library, and I think the examples should show how to do that part. Reading and writing audio files in different formats, and capture/playback via an audio api are separate tasks that usually is done with other libraries, that should have their own examples.

I think what is needed here is to improve the documentation, to better explain chunk size and how rubato is meant to be used. A complete example using both cpal and hound with rubato in the middle feels more like a project of its own.

Sep 14 '24 09:09 HEnquist

I have added a new section in the documentation here: https://github.com/HEnquist/rubato/pull/86/files Does this help?

Sep 16 '24 18:09 HEnquist

I have added a new section in the documentation here: #86 (files) Does this help?

@HEnquist thanks for that documentation. I've been trying to figure similar things out here and that documentation came at the right time.

I'm wondering if you could add a section on the inverse of what you have for "Resampling a stream". Where libraries like cpal call a callback with a buffer of some size that you are expected to fill. I'm pretty sure that buffer size that you need to fill can change on every iteration and it's not really clear which type one should use or how you should fill/drain it.

Sep 16 '24 20:09 jeffutter

What does "the inverse of what you have for 'Resampling a stream'" mean? The type of output buffer that CPAL requests from you is provided by CPAL. https://github.com/RustAudio/cpal/blob/master/examples/synth_tones.rs#L98

The fact that the size of CPAL's output buffer changes is not important. Isn't it already well explained in its documentation how to fill it? It says to use Arc<Mutex<Vec>> or RingBuffer.

+ From a third-party perspective, I think asking HEnquist to explain all of this in detail would be too bothersome and complex for him. It would be like asking me to rewrite my example (which you consider complex) above in English. I believe that would be asking too much from a resampling crate like Rubato.

Sep 17 '24 07:09 namse

@namse by inverse, I was referring to how the examples HEnquist added were reading from the cpal callback, buffering, resampling, and writing to something that isn't fixed-size. Still, I would love to understand the process where you have a stream of audio (in my case from the network) and you need to resample it into the variable-sized data of a cpal write callback.

It would be like asking me to rewrite my example (which you consider complex) above in English

I don't think I made any reference to your example.

I'm certainly not expecting HEnquist to write a third example if they don't feel like/want to. However, the two examples they illustrate are very thorough and helpful. This seems like a third case that isn't covered, if they had time and felt like adding a third case I'm sure others would find it helpful, if not that's fine too.

Sep 17 '24 13:09 jeffutter

I added some more details on choice of chunk size in that documentation PR.

I'm wondering if you could add a section on the inverse of what you have for "Resampling a stream". Where libraries like cpal call a callback with a buffer of some size that you are expected to fill.

I think the overall processes for capture and playback are similar enough that it is sufficient to describe one of them, especially since this readme is for a pure resampling library (and one could argue that the whole thing is out of scope).

If the buffer size varies, the only choice is to use an intermediate buffer to adapt the resampler and cpal chunk sizes. The shared buffer should work fine for this, just read as many frames from it as the api requests.

Sep 17 '24 19:09 HEnquist

Sorry for the delayed response. That updated documentation helps significantly!

Sep 25 '24 14:09 gluax

The doc updates are now included in the v0.16.0 release.

Sep 28 '24 09:09 HEnquist