Part of issue

#3057

What does this PR do?

Implement snac features and integration

Summary

Implement SNAC (Multi-Scale Neural Audio Codec) integration for Text-to-Speech applications
Add comprehensive TTS utilities and configuration presets for speech synthesis
Provide example demonstrating Qwen + SNAC TTS pipeline

Changes Made

New module: snac_tts_integration.rs with TTS-optimized SNAC codec wrapper
Enhanced SNAC model: Added TTS-specific methods (encode_for_tts, decode_from_tts_tokens, batch processing)
Config presets: Added default_tts(), high_quality_tts(), fast_tts() configurations
Utility functions: Memory estimation, token validation, voice embedding creation
Example implementation: qwen_snac_tts_example.rs showing complete TTS pipeline

Key Features

Multiple quality presets: 24kHz speech, 32kHz general, fast 16kHz options
TTS pipeline abstraction: SnacTtsPipeline for easy integration with language models
Batch processing support: Efficient handling of multiple audio streams
Memory optimization: Token padding, truncation, and memory estimation utilities
Voice cloning support: Reference audio embedding extraction

Sep 06 '25 13:09 maximizemaxwell

Hi, does it about to work? Seems we can support SparkTTS and VovyTTs once this workable.

Sep 07 '25 05:09 lucasjinreal

@maximizemaxwell Hi, would like add some checkpoint conversion docs, I'd like verify it's result is normal or not, once it done, we can consider merging SNAC support and enable several SOTA TTS models which used snac

Sep 18 '25 05:09 lucasjinreal

candle
candle copied to clipboard

Snac support

Part of issue

What does this PR do?

Summary

candle candle copied to clipboard

Snac support

Part of issue

What does this PR do?

Summary

candle
candle copied to clipboard