basic-pitch icon indicating copy to clipboard operation
basic-pitch copied to clipboard

Proposal: Real Time Streaming MIDI Output Support

Open Anipaleja opened this issue 8 months ago • 1 comments

Feature Proposal: Real Time Streaming MIDI Output Support

Summary

This feature request proposes the addition of real-time streaming MIDI output to basic-pitch, allowing the system to process and output MIDI data concurrently with audio input. This would significantly expand the usability of basic-pitch in live performance, educational, and DAW integration contexts.

Motivation

Currently, basic-pitch operates as a batch audio-to-MIDI converter, requiring the full audio file to be processed before producing MIDI output. While effective for offline applications, this architecture limits the tool’s applicability in live scenarios. Real-time audio-to-MIDI conversion has growing demand in:

  • Live instrument-to-MIDI conversion for digital audio workstations (DAWs)
  • Music education platforms requiring instant feedback
  • Interactive composition and improvisation tools
  • Low-latency MIDI controllers for experimental performance setups

Several commercial and research-grade tools provide real-time capabilities (e.g., JamOrigin MIDI Guitar, AIO MIDINet, and various ONNX-based pipelines), but few offer open-source solutions with the transcription accuracy that basic-pitch provides.

Proposed Implementation

A modular, low-latency real-time streaming pipeline could be introduced as an extension of the existing model. Suggested steps include:

Input Handling

  • Use pyaudio, sounddevice, or other low-latency libraries to stream audio input directly from a microphone or system source.
  • Implement windowed audio buffering with overlap to allow continuous model inference.

Inference Adaptation

  • Adapt the inference loop to process fixed-size frames (e.g., 2048 or 4096 samples) in real time.
  • Introduce incremental model state management to preserve performance across audio frames.

Streaming Output

  • Emit MIDI note events incrementally using a ring buffer or FIFO stream.
  • Optionally expose a MIDI output via mido, rtmidi, or similar libraries for live routing to DAWs or synthesizers.

Latency and Performance Tuning

  • Introduce a tunable latency buffer to balance between transcription accuracy and real-time responsiveness.
  • Profile model inference to determine optimal window sizes and overlaps under typical hardware constraints.

Optional Network Interface

  • For advanced use cases, expose the real-time inference through a lightweight WebSocket or gRPC API, enabling remote control and cloud deployment.

Anticipated Challenges

  • Model Adaptability: Ensuring the model performs well on partial inputs without full temporal context.
  • Latency Minimization: Achieving real-time responsiveness while maintaining accuracy will require careful tuning.
  • False Positives: Low-duration notes may introduce noise in real-time environments, so adaptive thresholding or smoothing may be necessary.

Benefits to the Ecosystem

  • Adds live performance capabilities to the basic-pitch ecosystem
  • Opens opportunities for integration with VSTs, DAWs, and educational tools
  • Fills a notable gap in the open-source music transcription landscape

Conclusion

Adding real-time streaming MIDI output to basic-pitch would make the tool significantly more versatile and competitive with proprietary solutions. Given its high transcription accuracy and open architecture, basic-pitch is well-positioned to lead in this space. This feature would serve both the open-source community and professional musicians seeking reliable, low-latency audio-to-MIDI conversion.

I’d be happy to contribute or assist with prototyping this functionality.

Anipaleja avatar Jul 05 '25 03:07 Anipaleja

Yeah this has been done already, you would want to use C++ and ONNXRuntime, python will not cut it. If you want integration with VST/other formats this would also need to be C++ as the various format sdks all use this.

joeloftusdev avatar Jul 13 '25 16:07 joeloftusdev