LocalAI feature: whisper api

https://platform.openai.com/docs/api-reference/audio/create

Needed:

[x] Basic wirings to whisper.cpp #211
[ ] Support for prompt
[ ] Support for response_format
[ ] Separate container images with ffmpeg (licensing)
[ ] Add tests (as models are free, and audios can be found online, should be pretty straightforward!)

May 01 '23 08:05 mudler

Hah nice ;) I'll take a crack at this, this week

May 09 '23 08:05 mattkanwisher

argh, started playing with this after seeing the message on twitter and just saw your message now here, sorry about this and stepping into your plans :/ . however, my is an easy hack around on https://github.com/go-skynet/whisper . I've just added some skeleton and wiring to the lib, it's far from being complete yet

May 09 '23 09:05 mudler

I've just added the skeleton now - I will take a look at adding basic tests over the weekend, everything else is open to being picked up :+1:

May 09 '23 09:05 mudler

Awesome I'll try it out and see if I can use it in my project. Feel free to drop some issues if there is stuff still to be done

May 11 '23 05:05 mattkanwisher

Response format support would be awesome! I use whisper for subtitles in SRT and VTT formats. Can't wait for it to be added. :thumbs-up:

Jul 04 '23 16:07 benwilcock

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To use the transcription endpoint in LocalAI, you need to have whisper.cpp installed and configured properly. You also need to have a model for audio transcription that is supported by ffmpeg. Here are some steps to help you get started:

Install whisper.cpp and ffmpeg on your machine. You can use the instructions in the LocalAI documentation to do this.
Download a suitable model for audio transcription from https://huggingface.co/ggerganov/whisper.cpp/tree/main or other sources. Make sure that the model is compatible with ffmpeg and whisper.cpp.
Create a YAML file for your model in the models directory, following the example provided in the LocalAI documentation.
Test the transcription endpoint by sending a POST request to /v1/audio/transcriptions with the appropriate parameters. You can use cURL or other tools to send the request.
If you encounter any issues, check the logs for error messages and debug the code to resolve the issue. You can also ask for help on the LocalAI community forum or IRC channel.

Sources:

https://localai.io/features/text-to-audio/index.html
https://github.com/go-skynet/LocalAI/tree/master/api/openai/transcription.go
https://github.com/go-skynet/LocalAI/tree/master/README.md
https://localai.io/features/audio-to-text/index.html

Jul 20 '23 21:07 localai-bot

Is temperature supported?

Dec 20 '23 14:12 EchedelleLR

Transcription stops at around 3min of the input audio. The mp3 file is 10MB and 7min. What scripts do I need to add to make it finish the whole audio. I was using command from the example. curl http://ai.savorcare.com/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/hlyyseg2.mp3" -F model="whisper-1"

Dec 23 '23 13:12 ER-EPR

LocalAI LocalAI copied to clipboard

feature: whisper api

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning:

LocalAI
LocalAI copied to clipboard