LocalAI
LocalAI copied to clipboard
feature: whisper api
https://platform.openai.com/docs/api-reference/audio/create
Needed:
- [x] Basic wirings to
whisper.cpp#211 - [ ] Support for
prompt - [ ] Support for
response_format - [ ] Separate container images with ffmpeg (licensing)
- [ ] Add tests (as models are free, and audios can be found online, should be pretty straightforward!)
Hah nice ;) I'll take a crack at this, this week
argh, started playing with this after seeing the message on twitter and just saw your message now here, sorry about this and stepping into your plans :/ . however, my is an easy hack around on https://github.com/go-skynet/whisper . I've just added some skeleton and wiring to the lib, it's far from being complete yet
I've just added the skeleton now - I will take a look at adding basic tests over the weekend, everything else is open to being picked up :+1:
Awesome I'll try it out and see if I can use it in my project. Feel free to drop some issues if there is stuff still to be done
Response format support would be awesome! I use whisper for subtitles in SRT and VTT formats. Can't wait for it to be added. :thumbs-up:
:warning::warning::warning::warning::warning:
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
:warning::warning::warning::warning::warning:
To use the transcription endpoint in LocalAI, you need to have whisper.cpp installed and configured properly. You also need to have a model for audio transcription that is supported by ffmpeg. Here are some steps to help you get started:
- Install whisper.cpp and ffmpeg on your machine. You can use the instructions in the LocalAI documentation to do this.
- Download a suitable model for audio transcription from https://huggingface.co/ggerganov/whisper.cpp/tree/main or other sources. Make sure that the model is compatible with ffmpeg and whisper.cpp.
- Create a YAML file for your model in the models directory, following the example provided in the LocalAI documentation.
- Test the transcription endpoint by sending a POST request to /v1/audio/transcriptions with the appropriate parameters. You can use cURL or other tools to send the request.
- If you encounter any issues, check the logs for error messages and debug the code to resolve the issue. You can also ask for help on the LocalAI community forum or IRC channel.
Sources:
- https://localai.io/features/text-to-audio/index.html
- https://github.com/go-skynet/LocalAI/tree/master/api/openai/transcription.go
- https://github.com/go-skynet/LocalAI/tree/master/README.md
- https://localai.io/features/audio-to-text/index.html
Is temperature supported?
Transcription stops at around 3min of the input audio. The mp3 file is 10MB and 7min. What scripts do I need to add to make it finish the whole audio.
I was using command from the example.
curl http://ai.savorcare.com/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/hlyyseg2.mp3" -F model="whisper-1"