WhisperKit icon indicating copy to clipboard operation
WhisperKit copied to clipboard

resampleBuffer may failed because the calculated capacity is less than 1

Open Josscii opened this issue 1 year ago • 2 comments

https://github.com/argmaxinc/WhisperKit/blob/3ebfa142a0e181668882e8e1c54088a528e2907b/Sources/WhisperKit/Core/Audio/AudioProcessor.swift#L416-L426

Josscii avatar Oct 15 '24 18:10 Josscii

@Josscii thanks for the report, did you have any error logs from a crash you experienced?

ZachNagengast avatar Oct 15 '24 19:10 ZachNagengast

Rounding buffer frame capacity from 0.36281179138321995 to 0.0 to better fit new sample rate
buffer 0 ptr 0x0 size 0
AudioConverter -> 0x303de2490: FillComplexBuffer in-process render returned -50
Failed to resample buffer: Error converting audio: Error Domain=NSOSStatusErrorDomain Code=-50 "(null)"

Josscii avatar Oct 16 '24 02:10 Josscii

I just fix this by check if capacity == 0, assign it to 1.

if capacity == 0 {
  capacity = 1
}

Josscii avatar Nov 02 '24 13:11 Josscii

Seeing same error, so apparently not that rare. Will see if I can put together a PR

drewmccormack avatar Jan 21 '25 13:01 drewmccormack

Do either of you have any sample audio files that reliably reproduce this? Curious about what cases it comes up in, if capacity is less than 1 pre rounding we could skip the call to resampleBuffer entirely.

ZachNagengast avatar Jan 21 '25 22:01 ZachNagengast

Ugh, I tried to reproduce the error with the existing main just now, and for some reason it is not happening. Must be some subtlety in the rounding. Perhaps it depends on the state of the audio system or something.

The capacity was definitely less than 1. It was something like 0.36. And indeed, I considered avoiding the call to rebuffer, but that required moving the capacity calculation up a few levels, and I figured what I ended up doing was fine too, and simpler.

I wish I had kept a snapshot of the debugger, but in essence I had some audio at 44100, and I guess that gets downsampled to 16000.

I remember the audio was 30.0s long, but the duration was determined to be 30.00002 or something like that, which was one frame longer. It was probably 1 frame longer at 44.1, but that is then less than a frame at 16. Something like that.

I think maxReadFrameSize was what you would expect, but some other quantities (eg frameCount?) were one more.

Sorry I can't reproduce it. I should have captured that audio when it happened. Even so, it might not have been reproducible if it is in some way dependent on state in the audio system.

drewmccormack avatar Jan 22 '25 07:01 drewmccormack

Got it, was seeing that too. Well no worries, your PR looks fairly harmless and the tests are passing so I think it's good to go and should catch this edge case 👍

ZachNagengast avatar Jan 22 '25 07:01 ZachNagengast