speech-language-model topic
xcodec
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
WavTokenizer
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
SLED-TTS
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
WavChat
A Survey of Spoken Dialogue Models (60 pages)
SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications