cpal icon indicating copy to clipboard operation
cpal copied to clipboard

Realtime CLI spectrogram example

Open MatiasHiltunen opened this issue 6 months ago • 7 comments

Primary purpose of this PR is to add an example to demonstrate additional way to utilize audio input stream while offering a tool to visualize the stream for many purposes. While the amount of code and features included in the spectogram example are quite complex, it may be usefull for someone like me who is just entering the world of lower level audio. Currently the code uses OS's default input device.

While developing this example I noticed that on Windows using wasapi host AGC or noise suppression comes quickly into play when visualizing realtime audio input, regardless that build_input_stream_raw_inner should supposedly give raw audio stream if OS or driver agrees on that. To get real raw input audio on Windows with wasapi, I added feature to request raw audio stream behind new environment variable (global with OnceLock) that can be used to enable the mentioned feature. With that in place I was able to disable AGC/noice suppression on Windows 11 and get truly raw stream which allowed to run the spectrogram indefinitely without disturbance from OS/driver level filters. I did not face similar challenges on MacOS where the audiostream was seemingly untouched or atleast did not affect it at runtime. On Windows possible usecases with this could be for example longer running audio recordings where the volume and quality should stay constant, or if one would like to handle those by themselves. Windows seems to start lowering the input sound volume after certain period of inactivity

Spectrogram example is briefly tested to work on real devices: Mac Mini M4 (15.5 Sequio), Linux and Windows 11.

The example is built with existing dependencies, only change to Cargo.toml so far is addition of libc for MacOS's dev-dependencies to allow creation of TUI app with minimal dependencies.

Check the comments along the code for additional information. This example has been reviewed by multiple runs on number of different LLMs such as Claude 4 Opus.

Br. Matias

MatiasHiltunen avatar Jun 20 '25 14:06 MatiasHiltunen

Näyttökuva 2025-06-22 210145 Reference image of zoomed out terminal on windows while running this example.

MatiasHiltunen avatar Jun 24 '25 13:06 MatiasHiltunen

This is really cool and I can definitely see it landing somewhere in the Rust audio ecosystem. I'm wondering if Rodio would be the best place to get it landed?

Normally I'd think so but then this PR also adds the point of raw access on WASAPI. That's something Rodio does not have access to, and actually, cpal today neither has access to.

So offering additional WASAPI knobs is interesting, though I'm not a fan of the approach with the environment variable. What else can we think of that's more idiomatic - in the sense of a builder pattern, host/stream configuration options, or the like? I can imagine that other hosts also could have knobs that are worthwhile exposing, so I'd be interested to see what we could conjure up.

Then maybe split the PR into a spectrograph for Rodio and host options in cpal?

roderickvd avatar Jul 29 '25 20:07 roderickvd

Forgive me if I'm misunderstanding, but is this implementing a way to create a spectrogram—built into cpal ?

I like the idea of a fast way to generate a spectrogram, and I would 100% use that in the near future, but I feel like that dilutes the goal of cpal, which I understood to be as low level as possible for directly creating audio I/O; cpal is what OpenGL is to SwiftUI.

If this is just for an example though, that, I believe, is a really good feature to introduce, and showcases a little bit more of what CPAL can do for non-input-to-output features.

wgibbs-rs avatar Jul 31 '25 19:07 wgibbs-rs

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

@wgibbs-rs Spectrogram would be just an example, no integration to cpal :)

MatiasHiltunen avatar Aug 02 '25 16:08 MatiasHiltunen

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

Cool. Yes, a feature flag could also work.

Thinking out loud, is there any reason why a user would not want it? Or would this be going into too much of opinionated territory?

roderickvd avatar Aug 03 '25 21:08 roderickvd

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

Cool. Yes, a feature flag could also work.

Thinking out loud, is there any reason why a user would not want it? Or would this be going into too much of opinionated territory?

If you are referring to changes in wasapi's build_input_stream_raw_inner function it could possibly be what user actually expects of but I would not add this without a way to explicitly enable it just yet

MatiasHiltunen avatar Aug 06 '25 14:08 MatiasHiltunen

I share the feeling that we could make it opt-in for now and consider transitioning to making it the default later.

Just to rationalize it though, what would be pros/cons?

roderickvd avatar Aug 06 '25 16:08 roderickvd