WhisperKit icon indicating copy to clipboard operation
WhisperKit copied to clipboard

Resampling fixes and tests

Open keleftheriou opened this issue 5 months ago • 0 comments

This is meant to demonstrate some problems with the current resampling code & usage, and propose a new implementation.

The current resampling code and usage has the following issues:

  • Fails with loadAudioFailed("Failed to process audio buffer") on certain inputs - e.g. for 44.1khz files with frame counts of 12289 + 1024*N.
  • Takes up roughly twice as much memory, since both the AVAudioPCMBuffer returned from AudioProcessor.loadAudio() and the [Float] returned from AudioProcessor.convertBufferToArray() are retained for the entire duration of the inner transcribe(audioArray: ...) call.
  • The code seems unnecessarily complex, with an error-prone structure.

The included tests attempt to demonstrate the problem and stress-test a proposed implementation via the use of dynamically created silent files of arbitrary lengths and sample rates.

keleftheriou avatar Sep 29 '24 04:09 keleftheriou