speech-language-model topics

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

ictnlp

large-language-models

multimodal-large-language-models

speech-interaction

speech-language-model

WavTokenizer

1.2k

Stars

102

Forks

1.2k

Watchers

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

jishengpeng

acoustic

audio-representation

codec

dac

SLED-TTS

104

Stars

7

Forks

104

Watchers

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

ictnlp

speech-language-model

speech-synthesis

streaming-inference

text-to-speech

WavChat

310

Stars

17

Forks

310

Watchers

A Survey of Spoken Dialogue Models (60 pages)

jishengpeng

duplex

encodec

gpt-4o

intreaction

SoCodec

82

Stars

7

Forks

82

Watchers

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

hhguo

audio

speech

speech-codec

speech-language-model