mlx-audio
mlx-audio copied to clipboard
STS roadmap
The roadmap covers both approaches you mentioned:
- End-to-End Speech-to-Speech Models: A direct approach using dedicated STS architectures like Moshi.
- Modular Voice Pipeline: A composable approach combining Speech-to-Text, LLM processing, and Text-to-Speech.