WhisperKit
WhisperKit copied to clipboard
resampleBuffer may failed because the calculated capacity is less than 1
https://github.com/argmaxinc/WhisperKit/blob/3ebfa142a0e181668882e8e1c54088a528e2907b/Sources/WhisperKit/Core/Audio/AudioProcessor.swift#L416-L426
@Josscii thanks for the report, did you have any error logs from a crash you experienced?
Rounding buffer frame capacity from 0.36281179138321995 to 0.0 to better fit new sample rate
buffer 0 ptr 0x0 size 0
AudioConverter -> 0x303de2490: FillComplexBuffer in-process render returned -50
Failed to resample buffer: Error converting audio: Error Domain=NSOSStatusErrorDomain Code=-50 "(null)"
I just fix this by check if capacity == 0, assign it to 1.
if capacity == 0 {
capacity = 1
}
Seeing same error, so apparently not that rare. Will see if I can put together a PR
Do either of you have any sample audio files that reliably reproduce this? Curious about what cases it comes up in, if capacity is less than 1 pre rounding we could skip the call to resampleBuffer entirely.
Ugh, I tried to reproduce the error with the existing main just now, and for some reason it is not happening. Must be some subtlety in the rounding. Perhaps it depends on the state of the audio system or something.
The capacity was definitely less than 1. It was something like 0.36. And indeed, I considered avoiding the call to rebuffer, but that required moving the capacity calculation up a few levels, and I figured what I ended up doing was fine too, and simpler.
I wish I had kept a snapshot of the debugger, but in essence I had some audio at 44100, and I guess that gets downsampled to 16000.
I remember the audio was 30.0s long, but the duration was determined to be 30.00002 or something like that, which was one frame longer. It was probably 1 frame longer at 44.1, but that is then less than a frame at 16. Something like that.
I think maxReadFrameSize was what you would expect, but some other quantities (eg frameCount?) were one more.
Sorry I can't reproduce it. I should have captured that audio when it happened. Even so, it might not have been reproducible if it is in some way dependent on state in the audio system.
Got it, was seeing that too. Well no worries, your PR looks fairly harmless and the tests are passing so I think it's good to go and should catch this edge case 👍