Speech-Tokenization-Papers icon indicating copy to clipboard operation
Speech-Tokenization-Papers copied to clipboard

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

Speech-Tokenization-Papers

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

Papers

2023

  • [arXiv][demo][code] RepCodec: A Speech Representation Codec for Speech Tokenization

  • [arXiv] [demo] [code] SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

  • [arXiv][demo][code] HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

2022

  • [arXiv][demo][code] High Fidelity Neural Audio Compression

  • [arXiv][demo] Autoregressive Co-Training for Learning Discrete Speech Representations

2021

  • [ASRU][arXiv] W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

  • [TASLP][arXiv][demo] SoundStream: An End-to-End Neural Audio Codec

  • [TASLP][arXiv][code] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

  • [arXiv] Variable-rate discrete representation learning

2020

  • [arXiv] [code] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

  • [arXiv][code] Vector-Quantized Autoregressive Predictive Coding

2019

  • [arXiv][demo] Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

  • [arXiv] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

  • [TASLP][arXiv] Unsupervised speech representation learning using WaveNet autoencoders