bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Added Audio Input Support

Open bushrat011899 opened this issue 1 year ago • 7 comments

Objective

  • Fixes #9954

Solution

  • Created a new feature in bevy_audio, input, which listens to the default audio input device (if any) and publishes samples into the ECS using Events.
  • Added an example of a simple oscilloscope, which draws the waveform of every channel in the recorded audio input using Gizmos

Changelog

  • Added input feature to bevy_audio, which is activated by the audio-input feature in bevy/bevy_internal. This is an optional feature. With it disabled, everything added in this PR is removed, leaving no impact.
  • Added a NonSend Resource AudioInputStream, which handles the cpal Stream. If this resource is dropped, then the Stream is also dropped with no memory leaked.
  • Added an Event AudioInputEvent, which is dispatched whenever an audio input is recorded by AudioInputStream.
  • Added a simple system handle_input_stream which marshals AudioInputEvent events out of AudioInputStream and into the ECS.
  • Updated AudioPlugin to setup everything required to start recording from the default audio input device.
  • Added an example audio/audio_input, which is a simple oscilloscope rendering recorded audio samples using Gizmos. See below animated GIF for a low-framerate representation of the example.
  • Added some commands and control starting and stopping the audio input stream.
  • Added AudioInput and AudioInputOptions to allow for the selection of a non-default audio input device and configuration (if desired). Alternatively, a cpal Device can be turned into a AudioInput which can then be used to create the AudioInputStream.

Notes

With the audio-input feature enabled, all a user has to do to get access to the microphone is create a system which listens to AudioInputEvent events:

use bevy::prelude::*;

fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_systems(Update, (system, push_to_listen))
        .run();
}

/// Only record audio while the user has held the space-bar key
fn push_to_listen(mut commands: Commands, keyboard_input: Res<Input<KeyCode>>) {
    if keyboard_input.just_pressed(KeyCode::Space) {
        commands.start_recording_audio();
    } else if keyboard_input.just_released(KeyCode::Space) {
        commands.stop_recording_audio();
    }
}

fn system(mut inputs: EventReader<AudioInputEvent>) {
    for input in inputs.read() {
        // You now have access to the sampled input waveform and metadata
    }
}

AudioInputEvent has been designed to include all the contextual information a consumer might need to work with the recording (sample rate, channels, timing). This metadata does increase the size of this struct more than it might need to, as this information could live in a separate resource. However, I think this is more ergonomic, and considering the relative size of the audio sample itself (usually 900 f32 values in a vector on my machine) compared to the metadata (less than 100 bytes), it's not substantially more wasteful.

I haven't tested latency or performance impact of this running alongside other Bevy modules. From my rough experimentation with the example created, it appears to be fairly low latency, but someone more qualified would be a better judge than me on that (I was a bad drummer). I have a short slow-motion video where I tap on my microphone (to create an impulse). Recording done on an Xperia 1 IV pointed at a Windows 10 PC. The video appears to show a latency less than 10 ms (the smallest measurement possible under this particular setup), which should be sufficient for communication applications (e.g., voice chat within a multiplayer game), but further testing would be required to assess more demanding applications (e.g., vocal effects for a live performance, etc.)

I have attempted to minimise/eliminate any allocation in the hot-path beyond Bevy's ECS operations using bounded MPSC channels and pre-allocated vectors, but I'm still new to allocation-free Rust programming, so there may be room for further optimisation going forward.

Finally, I'm adding this PR in a ready-to-go state, since it works and passes all my local CI, but I am open to suggestions on adding/removing parts, since this is a mostly brand new feature.

audio_input Example

bushrat011899 avatar Oct 10 '23 01:10 bushrat011899

The generated examples/README.md is out of sync with the example metadata in Cargo.toml or the example readme template. Please run cargo run -p build-templated-pages -- update examples to update it, and commit the file change.

github-actions[bot] avatar Oct 10 '23 02:10 github-actions[bot]

You added a new feature but didn't update the readme. Please run cargo run -p build-templated-pages -- update features to update it, and commit the file change.

github-actions[bot] avatar Oct 10 '23 02:10 github-actions[bot]

Another area of concern is this will start piping data from the audio input source into the ECS straight away, and continue to do so for the entire runtime of the application. This can be controlled by calling pause/play on the Stream contained within AudioInputStream, but it is worth noting here.

I think we shouldn't start recording at start, but only after it's asked for by the game

mockersf avatar Oct 13 '23 08:10 mockersf

I think we shouldn't start recording at start, but only after it's asked for by the game

Totally fair. I've updated this PR to include some Command's which control whether the stream is collecting input data.

fn push_to_listen(mut commands: Commands, keyboard_input: Res<Input<KeyCode>>) {
    if keyboard_input.just_pressed(KeyCode::Space) {
        commands.start_recording_audio();
    } else if !keyboard_input.pressed(KeyCode::Space) {
        commands.stop_recording_audio();
    }
}

With this, I've removed the setup code I had before, so with the audio-input feature enabled, all that happens is the input device is found and a stream is setup, but it is no longer started.

bushrat011899 avatar Oct 13 '23 10:10 bushrat011899

I've updated this PR to be as close to zero allocation (within the hot-path) as I think I can whilst making minimal assumptions. In addition, I've completely decoupled from rodio internally and for the public API, and only offer optional cpal APIs. For the vast majority of use cases with this API, only the Bevy controlled types will ever be interacted with, allowing for a zero-break changeover to Kira (if desired), and a minimal break if we ever moved off of cpal (I don't see any indication this would happen).

In addition, I have done some more latency testing and updated the description to include the details. With my limited setup, I was not able to detect a latency larger than 10ms (the floor of my measurement precision) for the entire pipeline of my example (physical sound through to visual display on my monitor). I would still encourage anyone able to try and measure this more precisely if possible. I doubt this implementation would be low enough latency for musical performance (e.g., using Bevy for custom audio effects on a MIDI instrument), but I am confident it is low enough for communication.

bushrat011899 avatar Mar 31 '24 01:03 bushrat011899

It has been close to 4 months, any update on status of this?

rudrabhoj avatar Jul 07 '24 08:07 rudrabhoj

@rudrabhoj We're in the process of reworking the audio engine entirely, as we've reached the limits of the current solution. As such this may take a while to get re-implemented, but audio input is definitely planned in the future.

SolarLiner avatar Jul 08 '24 06:07 SolarLiner