Scalable Factorized Hierarchical Variational Autoencoders

This repository contains (refactored) codes to reproduce the core results from the two papers:

Previous version of the codes can be found here

If you find the code useful, please cite

@inproceedings{hsu2017learning,
  title={Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data},
  author={Hsu, Wei-Ning and Zhang, Yu and Glass, James},
  booktitle={Advances in Neural Information Processing Systems},
  year={2017},
}
@article{hsu2018scalable,
  title={Scalable Factorized Hierarchical Variational Autoencoder Training},
  author={Hsu, Wei-Ning and Glass, James},
  journal={arXiv preprint arXiv:1804.03201},
  year={2018},
  arxiv={1804.03201},
}

Dependencies

This project uses Python 2.7.6. Before running the code, you have to install

Tensorflow 1.0
Scipy
Numpy
CFFI
Soundfile
Matplotlib
Scikit-Learn
LibROSA
SPHFile (optional, for preprocessing TIMIT raw data)
Kaldi (optional)
Kaldi-Python (optional)

The former 9 dependencies can be installed using pip by running

pip install -r requirements.txt

The last one requires Kaldi before a specific commit (d1e1e3b). If you don't have Kaldi before that version, you can install both Kaldi and Kaldi-Python by running

make all

Getting Started

Main source codes can be found in ./fhvae/. ./scripts contains runable python scripts. Example scripts for preprocessing are in ./examples/.

Two dataset formats are allowed: Kaldi and Numpy. Dataset should be stored in ./datasets/<dataset_name>/<set_name>/, where <set_name> refers to {train,dev,test}. Each set folder should contain a feats.scp and a len.scp. *.scp files follow Kaldi's script-file format, where each line is:

sequence-id value

The value for len.scp is an integer denoting the feature sequence length, and the value for feats.scp is *.npy (Numpy format) or *.ark:<offset> (Kaldi format).

Such files can be prepared with ./scripts/preprocess/prepare_kaldi_data.py (Kaldi format) or ./scripts/preprocess/prepare_numpy_data.py (Numpy format), given a wav.scp file.

Before running any codes, source the environment script first to update $PYTHONPATH:

. ./env.sh

Preprocessing

We now provide numpy preprocessing recipes for TIMIT and LibriSpeech from a raw data directory

python ./examples/prepare_timit_numpy <TIMIT_DIR>	# TIMIT
python ./examples/prepare_librispeech_numpy <LIBRISPEECH_DIR> # LibriSpeech

use -h to see more options

Training with Hierarchical Sampling

python ./scripts/train/run_hs_train.py --dataset=timit_np_fbank --is_numpy --nmu2=2000

Experiments will be saved to ./exp/timit_np_fbank/<exp_name>.

Training without Hierarchical Sampling (original FHVAE training)

python ./scripts/train/run_train.py --dataset=timit_np_fbank --is_numpy

Experiments will be saved to ./exp/timit_np_fbank/<exp_name>.

Evaluation

python scripts/eval/run_eval.py ./exp/timit_np_fbank/<exp_name> --seqlist=./misc/timit_eval.txt

Use --seqlist to specify which sequences to use for qualitative evaluation. Results with be saved to ./exp/timit_np_fbank/<exp_name>/img.

ScalableFHVAE
ScalableFHVAE copied to clipboard

Metadata

Scalable Factorized Hierarchical Variational Autoencoders

Dependencies

Getting Started

Preprocessing

Training with Hierarchical Sampling

Training without Hierarchical Sampling (original FHVAE training)

Evaluation

← Metadata

Owner

Metadata

ScalableFHVAE ScalableFHVAE copied to clipboard

Metadata

Scalable Factorized Hierarchical Variational Autoencoders

Dependencies

Getting Started

Preprocessing

Training with Hierarchical Sampling

Training without Hierarchical Sampling (original FHVAE training)

Evaluation

← Metadata

Owner

Metadata

ScalableFHVAE
ScalableFHVAE copied to clipboard