WhisperAPI
WhisperAPI copied to clipboard
WhisperAPI is a fast and reliable API that transcribes video and audio files into text with support for all models and languages. It offers time-stamped results and translation to English.
WhisperAPI
WhisperAPI is a wrapper for Whisper.cpp a C++ implementation of the original OpenAI Whisper that greatly enhances its performance and speed.
AppSettings
You will need to edit the appsettings.json
file to contain a full path to where you want to store models and audio files.
{
"WhisperSettings": {
"Folder": "/path/to/whisper/folder"
}
}
In the Folder
property you will need to provide a full path to where you want to store models and audio files.
Note
Translation increase the processing time, sometimes 2x the time! So avoid translation for long videos or audios.
Features
- Transcribe video and audio files into text
- Supports all models
- Easy to use and integrate into your own projects
- Fast and reliable transcription results
- Supports every language by OpenAI Whisper
- Ability to translate transcribed text to English
Notes
- You can use any language codes supported by OpenAI Whisper
- If you're unsure or don't know ahead of time which language code you need you can omit lang property.
- Supported Models are: Tiny, Base, Medium and Large.
Usage
Before making a request to transcribe a file, you should query the /models
endpoint to get a list of all available models.
curl --location --request GET 'https://localhost:5001/models'
To use WhisperAPI, you need to send a POST request to the /transcribe
endpoint with the following form-data payload:
file: @/path/to/file/
model: String
translate: Boolean
Additionally, you can add headers to the request for language and response type preferences.
Accept: application/json
Accept-Language: en
The file should be provided as a multipart/form-data field named file
.
translate
is an optional property.
- If the
Accept
header is omitted, the API will automatically detect the language of the file. - If the
translate
property is omitted, it defaults to false.
Here is an example of a request using curl:
curl --location --request POST 'https://localhost:5001/transcribe' \
--header 'Accept: application/json' \
--header 'Accept-Language: English' \
--form 'file=@"/path/to/file/"' \
--form 'model="base"' \
--form 'translate="true"'
The response will be a JSON payload with the following format:
{
"data": [
{
"start": 0,
"end": 3,
"text": "Hello!"
},
{
"start": 3,
"end": 6,
"text": " World!"
}
],
"count": 2
}
If text/plain
is used the response will look like this:
Hello! World!
If application/xml
is used the response will look like this:
<JsonResponse>
<Data>
<ResponseData>
<Start>0</Start>
<End>3</End>
<Text>Hello</Text>
</ResponseData>
<ResponseData>
<Start>3</Start>
<End>6</End>
<Text> World!</Text>
</ResponseData>
</Data>
<Count>2</Count>
</JsonResponse>
If application/x-subrip
is used the response will look like this:
1
00:00:00,000 --> 00:00:05,000
Hello
2
00:00:05,000 --> 00:00:10,000
World
On failure (e.g: invalid file format) the response JSON will be:
{
"error": "Error message"
}
Contributing
We welcome contributions to WhisperAPI! If you would like to contribute, simply fork the repository and submit a pull request with your changes.
Support
If you need help with WhisperAPI, please create an issue on GitHub and I will respond as soon as possible.