LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

feat: Realtime API support

Open mudler opened this issue 1 year ago • 9 comments

Description

This PR fixes https://github.com/mudler/LocalAI/issues/3714

And also covers #191

Notes for Reviewers

Signed commits

  • [ ] Yes, I signed my commits.

mudler avatar Oct 03 '24 17:10 mudler

Deploy Preview for localai ready!

Name Link
Latest commit f272605b950d35e4360d638a9b30fa7e343749e4
Latest deploy log https://app.netlify.com/sites/localai/deploys/67868d4d9141c90008d963f5
Deploy Preview https://deploy-preview-3722--localai.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Oct 03 '24 17:10 netlify[bot]

yamllint Failed

Show Output
::group::gallery/arch-function.yaml
::error file=gallery/arch-function.yaml,line=66,col=22::66:22 [new-line-at-end-of-file] no new line character at the end of file
::endgroup::

Workflow: Yamllint GitHub Actions, Action: __karancode_yamllint-github-action, Lint: gallery

github-actions[bot] avatar Oct 09 '24 10:10 github-actions[bot]

Just for reference, openai-realtime-console seems quite nice for testing things out especially at this stage, I've opened up a PR upstream to include a Dockerfile and instructions on how to use it with a local server: https://github.com/openai/openai-realtime-console/pull/59

mudler avatar Oct 14 '24 21:10 mudler

whats best option here if we want to contribute just make forks of the branch and PRS against this?

mattkanwisher avatar Nov 07 '24 07:11 mattkanwisher

whats best option here if we want to contribute just make forks of the branch and PRS against this?

Yes, that would work just fine!

mudler avatar Nov 07 '24 08:11 mudler

What is done:

  • [x] API spec
  • [x] Updating session, starting VAD server
  • [x] Hooking server API specs to placeholder functions
  • [x] Register ws server, and test client side functionalities
  • [x] Created a wrapped model definition for emulating Audio-to-Audio models when backend does not support it (via SST->LLM->TTS pipeline)

things left:

  • [ ] handling conversations templating (like we do in chat.go, this a good opportunity to do some code extraction)
  • [ ] add a VAD backend, or embed directly VAD in the current golang code. having a backend would make it modular and re-use part of the existing code base
  • [ ] Add Audio-to-Audio backend and define the gRPC APIs for it. Implement usage here
  • [ ] Hook the model interface to the various Backends functions, and update the wrapped model so it works both when emulating an Audio-to-Audio model (by running things in a pipeline: SST -> LLM -> TTS) and Audio-to-Audio

mudler avatar Nov 12 '24 18:11 mudler

Currently at creating the VAD backend with silero, attach it to the compilation process and to the binary releases

mudler avatar Nov 13 '24 18:11 mudler

mh. things are in the good direction but still VAD isn't right, it detects the start of the conversation, but can't detect the end segment yet.

mudler avatar Nov 14 '24 18:11 mudler

Extracted silero-vad bits over here: https://github.com/mudler/LocalAI/pull/4204 so can be tackled separately

mudler avatar Nov 20 '24 09:11 mudler

closing as we merged it in https://github.com/mudler/LocalAI/pull/5392

mudler avatar May 27 '25 06:05 mudler