whisper.cpp
whisper.cpp copied to clipboard
AVFoundation AVAudioNode installTap audio buffer format for whisper.cpp
Any hint on how I need to convert the buffer from AVAudioNode installTap so it is compatible with whisper.cpp?
https://developer.apple.com/documentation/avfaudio/avaudionode/1387122-installtap
I tried simply to convert it to an array:
Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count: Int(buffer.frameLength)))
Also, tried to use a AVAudioConverter:
let recordingFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: Double(16000.0), channels: 1, interleaved: true)
let formatConverter = AVAudioConverter(from:inputFormat, to: recordingFormat!)
...
formatConverter.convert(to: pcmBuffer!, error: &error, withInputFrom: inputBlock)
let channelData = pcmBuffer!.floatChannelData {
let channelDataValue = channelData.pointee
let channelDataValueArray = stride(from: 0,
to: Int(pcmBuffer!.frameLength),
by: buffer.stride).map{ channelDataValue[$0] }
But both of those approaches just resulted in [BLANK_AUDIO], (sighs) or (wind_noise).
I am trying to create a stream/realtime implementation with Swift.
I had the same problem when I built this demo. whisper.swiftui. This is a real-time demo.
The original pcmBuffer cannot be recognized and needs to be converted.
You can take a look at my implementation.
func decodePCMBuffer(_ buffer: AVAudioPCMBuffer) throws -> [Float] {
guard let floatChannelData = buffer.floatChannelData else {
throw NSError(domain: "Invalid PCM Buffer", code: 0, userInfo: nil)
}
let channelCount = Int(buffer.format.channelCount)
let frameLength = Int(buffer.frameLength)
var floats = [Float]()
for frame in 0..<frameLength {
for channel in 0..<channelCount {
let floatData = floatChannelData[channel]
let index = frame * channelCount + channel
let floatSample = floatData[index]
floats.append(max(-1.0, min(floatSample, 1.0)))
}
}
return floats
}