candle
candle copied to clipboard
Snac support
Part of issue
#3057
What does this PR do?
Implement snac features and integration
Summary
- Implement SNAC (Multi-Scale Neural Audio Codec) integration for Text-to-Speech applications
- Add comprehensive TTS utilities and configuration presets for speech synthesis
- Provide example demonstrating Qwen + SNAC TTS pipeline
Changes Made
- New module: snac_tts_integration.rs with TTS-optimized SNAC codec wrapper
- Enhanced SNAC model: Added TTS-specific methods (encode_for_tts, decode_from_tts_tokens, batch processing)
- Config presets: Added default_tts(), high_quality_tts(), fast_tts() configurations
- Utility functions: Memory estimation, token validation, voice embedding creation
- Example implementation: qwen_snac_tts_example.rs showing complete TTS pipeline
Key Features
- Multiple quality presets: 24kHz speech, 32kHz general, fast 16kHz options
- TTS pipeline abstraction: SnacTtsPipeline for easy integration with language models
- Batch processing support: Efficient handling of multiple audio streams
- Memory optimization: Token padding, truncation, and memory estimation utilities
- Voice cloning support: Reference audio embedding extraction
Hi, does it about to work? Seems we can support SparkTTS and VovyTTs once this workable.
@maximizemaxwell Hi, would like add some checkpoint conversion docs, I'd like verify it's result is normal or not, once it done, we can consider merging SNAC support and enable several SOTA TTS models which used snac