LocalAI feat: Realtime API support

Description

This PR fixes https://github.com/mudler/LocalAI/issues/3714

And also covers #191

Notes for Reviewers

Signed commits

[ ] Yes, I signed my commits.

Oct 03 '24 17:10 mudler

Deploy Preview for localai ready!

Name	Link
Latest commit	f272605b950d35e4360d638a9b30fa7e343749e4
Latest deploy log	https://app.netlify.com/sites/localai/deploys/67868d4d9141c90008d963f5
Deploy Preview	https://deploy-preview-3722--localai.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Oct 03 '24 17:10 netlify[bot]

`yamllint` Failed

Show Output

::group::gallery/arch-function.yaml
::error file=gallery/arch-function.yaml,line=66,col=22::66:22 [new-line-at-end-of-file] no new line character at the end of file
::endgroup::

Workflow: Yamllint GitHub Actions, Action: __karancode_yamllint-github-action, Lint: gallery

Oct 09 '24 10:10 github-actions[bot]

Just for reference, openai-realtime-console seems quite nice for testing things out especially at this stage, I've opened up a PR upstream to include a Dockerfile and instructions on how to use it with a local server: https://github.com/openai/openai-realtime-console/pull/59

Oct 14 '24 21:10 mudler

whats best option here if we want to contribute just make forks of the branch and PRS against this?

Nov 07 '24 07:11 mattkanwisher

whats best option here if we want to contribute just make forks of the branch and PRS against this?

Yes, that would work just fine!

Nov 07 '24 08:11 mudler

What is done:

[x] API spec
[x] Updating session, starting VAD server
[x] Hooking server API specs to placeholder functions
[x] Register ws server, and test client side functionalities
[x] Created a wrapped model definition for emulating Audio-to-Audio models when backend does not support it (via SST->LLM->TTS pipeline)

things left:

[ ] handling conversations templating (like we do in chat.go, this a good opportunity to do some code extraction)
[ ] add a VAD backend, or embed directly VAD in the current golang code. having a backend would make it modular and re-use part of the existing code base
[ ] Add Audio-to-Audio backend and define the gRPC APIs for it. Implement usage here
[ ] Hook the model interface to the various Backends functions, and update the wrapped model so it works both when emulating an Audio-to-Audio model (by running things in a pipeline: SST -> LLM -> TTS) and Audio-to-Audio

Nov 12 '24 18:11 mudler

Currently at creating the VAD backend with silero, attach it to the compilation process and to the binary releases

Nov 13 '24 18:11 mudler

mh. things are in the good direction but still VAD isn't right, it detects the start of the conversation, but can't detect the end segment yet.

Nov 14 '24 18:11 mudler

Extracted silero-vad bits over here: https://github.com/mudler/LocalAI/pull/4204 so can be tackled separately

Nov 20 '24 09:11 mudler

closing as we merged it in https://github.com/mudler/LocalAI/pull/5392

May 27 '25 06:05 mudler

LocalAI LocalAI copied to clipboard

feat: Realtime API support

✅ Deploy Preview for localai ready!

yamllint Failed

LocalAI
LocalAI copied to clipboard

Deploy Preview for localai ready!

`yamllint` Failed