candle icon indicating copy to clipboard operation
candle copied to clipboard

Snac support

Open maximizemaxwell opened this issue 2 months ago • 2 comments

Part of issue

#3057

What does this PR do?

Implement snac features and integration

Summary

  • Implement SNAC (Multi-Scale Neural Audio Codec) integration for Text-to-Speech applications
  • Add comprehensive TTS utilities and configuration presets for speech synthesis
  • Provide example demonstrating Qwen + SNAC TTS pipeline

Changes Made

  • New module: snac_tts_integration.rs with TTS-optimized SNAC codec wrapper
  • Enhanced SNAC model: Added TTS-specific methods (encode_for_tts, decode_from_tts_tokens, batch processing)
  • Config presets: Added default_tts(), high_quality_tts(), fast_tts() configurations
  • Utility functions: Memory estimation, token validation, voice embedding creation
  • Example implementation: qwen_snac_tts_example.rs showing complete TTS pipeline

Key Features

  • Multiple quality presets: 24kHz speech, 32kHz general, fast 16kHz options
  • TTS pipeline abstraction: SnacTtsPipeline for easy integration with language models
  • Batch processing support: Efficient handling of multiple audio streams
  • Memory optimization: Token padding, truncation, and memory estimation utilities
  • Voice cloning support: Reference audio embedding extraction

maximizemaxwell avatar Sep 06 '25 13:09 maximizemaxwell

Hi, does it about to work? Seems we can support SparkTTS and VovyTTs once this workable.

lucasjinreal avatar Sep 07 '25 05:09 lucasjinreal

@maximizemaxwell Hi, would like add some checkpoint conversion docs, I'd like verify it's result is normal or not, once it done, we can consider merging SNAC support and enable several SOTA TTS models which used snac

lucasjinreal avatar Sep 18 '25 05:09 lucasjinreal