MyButtermilk
MyButtermilk
@barinali any update?
This model is stronger: https://github.com/PaddlePaddle/PaddleOCR
Thank you very much for your relevant points. 1. I just tried it in Google AI Studio. This 19 Minutes long video https://www.youtube.com/watch?v=Lj7bsHnhoD8 has a pretty massive 330.601 Tokens with...
@nbonamy : So, what do you think?
Nice Design! Maybe add a drop zone? https://www.dropzone.dev/
A solution would be via support for pipecat: https://github.com/pipecat-ai/pipecat It supports natural streaming conversations with llms and is agnostic on the providers. They even just added Cartesia as they introduced...
Cartesia Ink-Whisper is more accurate than Fireworks:  And it is also faster:  5,5 hours / month of Transcription are free. Above that you need a subscription of 5...
Fireworks just launched a turnkey solution https://fireworks.ai/blog/voice-agents
Ultravox would be another alternative https://www.ultravox.ai/blog/ultravox-v0-5-taking-the-lead-in-speech-understanding Weights are available, here the 8b version: https://huggingface.co/fixie-ai/ultravox-v0_5-llama-3_1-8b
The implementation from Handy works well on Windows and Mac and he encourages people to fork it: https://github.com/cjpais/Handy