whisper.cpp
whisper.cpp copied to clipboard
Support for other sample rates
It seems just adjusting WHISPER_SAMPLE_RATE doesn't work :)
@luke-jr I’ve implemented resampling. I’m on Windows and using Media Foundation, so the OS already includes that code. Here’s an example how to create a media type compatible with this library:
HRESULT createOutputMt( bool diarize, CComPtr<IMFMediaType>& mt )
{
CHECK( MFCreateMediaType( &mt ) );
CHECK( mt->SetGUID( MF_MT_MAJOR_TYPE, MFMediaType_Audio ) );
CHECK( mt->SetGUID( MF_MT_SUBTYPE, MFAudioFormat_Float ) );
CHECK( mt->SetUINT32( MF_MT_AUDIO_SAMPLES_PER_SECOND, WHISPER_SAMPLE_RATE ) );
const uint32_t channels = diarize ? 2 : 1;
CHECK( mt->SetUINT32( MF_MT_AUDIO_NUM_CHANNELS, channels ) );
CHECK( mt->SetUINT32( MF_MT_AUDIO_BLOCK_ALIGNMENT, channels * 4 ) );
CHECK( mt->SetUINT32( MF_MT_AUDIO_AVG_BYTES_PER_SECOND, channels * 4 * WHISPER_SAMPLE_RATE ) );
CHECK( mt->SetUINT32( MF_MT_AUDIO_BITS_PER_SAMPLE, 32 ) );
CHECK( mt->SetUINT32( MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE ) );
return S_OK;
}
If you’re on Linux, use libsoxr library. I have not used it with Whisper, but I did for other stuff and it was pretty good. Most Linuxes include that library in their repository, e.g. Debian.
Is resampling the only way? It can't work in the original sample rate?
The constants in whisper.h are parameters of the model and cannot be modified. The only way is to resample the data to 16 kHz.