whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Support for other sample rates

Open luke-jr opened this issue 2 years ago • 2 comments

It seems just adjusting WHISPER_SAMPLE_RATE doesn't work :)

luke-jr avatar Dec 24 '22 03:12 luke-jr

@luke-jr I’ve implemented resampling. I’m on Windows and using Media Foundation, so the OS already includes that code. Here’s an example how to create a media type compatible with this library:

HRESULT createOutputMt( bool diarize, CComPtr<IMFMediaType>& mt )
{
	CHECK( MFCreateMediaType( &mt ) );
	CHECK( mt->SetGUID( MF_MT_MAJOR_TYPE, MFMediaType_Audio ) );
	CHECK( mt->SetGUID( MF_MT_SUBTYPE, MFAudioFormat_Float ) );
	CHECK( mt->SetUINT32( MF_MT_AUDIO_SAMPLES_PER_SECOND, WHISPER_SAMPLE_RATE ) );

	const uint32_t channels = diarize ? 2 : 1;
	CHECK( mt->SetUINT32( MF_MT_AUDIO_NUM_CHANNELS, channels ) );
	CHECK( mt->SetUINT32( MF_MT_AUDIO_BLOCK_ALIGNMENT, channels * 4 ) );
	CHECK( mt->SetUINT32( MF_MT_AUDIO_AVG_BYTES_PER_SECOND, channels * 4 * WHISPER_SAMPLE_RATE ) );
	CHECK( mt->SetUINT32( MF_MT_AUDIO_BITS_PER_SAMPLE, 32 ) );
	CHECK( mt->SetUINT32( MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE ) );
	return S_OK;
}

If you’re on Linux, use libsoxr library. I have not used it with Whisper, but I did for other stuff and it was pretty good. Most Linuxes include that library in their repository, e.g. Debian.

Const-me avatar Dec 24 '22 17:12 Const-me

Is resampling the only way? It can't work in the original sample rate?

luke-jr avatar Dec 25 '22 06:12 luke-jr

The constants in whisper.h are parameters of the model and cannot be modified. The only way is to resample the data to 16 kHz.

ggerganov avatar Dec 29 '22 11:12 ggerganov