Speech-Tokenization-Papers
Speech-Tokenization-Papers copied to clipboard

Published 20 hours ago •

→

Metadata

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

Readme
Issues

Speech-Tokenization-Papers

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

Papers

2023

[arXiv][demo][code] RepCodec: A Speech Representation Codec for Speech Tokenization
[arXiv] [demo] [code] SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
[arXiv][demo][code] HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

2022

[arXiv][demo][code] High Fidelity Neural Audio Compression
[arXiv][demo] Autoregressive Co-Training for Learning Discrete Speech Representations

2021

[ASRU][arXiv] W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
[TASLP][arXiv][demo] SoundStream: An End-to-End Neural Audio Codec
[TASLP][arXiv][code] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
[arXiv] Variable-rate discrete representation learning

2020

[arXiv] [code] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
[arXiv][code] Vector-Quantized Autoregressive Predictive Coding

2019

[arXiv][demo] Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
[arXiv] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
[TASLP][arXiv] Unsupervised speech representation learning using WaveNet autoencoders

About

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

15

Stars

0

Forks

Watchers

Owner

← Metadata

15

Stars

0

Forks

Watchers

Owner

Metadata

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.