audio-understanding topic
Awesome-Audio-LLM
Audio Large Language Models
Awesome-Omni-Large-Models-and-Datasets
🔥 Omni large models and datasets for understanding and generating multi-modalities.
Fun-ASR
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
VideoAgent
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
DIFFA
[AAAI 2026] DIFFA: Large Language Diffusion Models Can Listen and Understand
FlexSED
open-vocabulary sound event detection
Voxtral-AI-Demo-Local-Interface
Voxtral is a state-of-the-art model developed to handle both speech transcription and audio understanding with remarkable accuracy and efficiency. This demo interface lets you run the Voxtral model o...