Ossian icon indicating copy to clipboard operation
Ossian copied to clipboard

Training takes up too much disk space

Open JRMeyer opened this issue 6 years ago • 0 comments

Hi Oliver,

When I do training on a 3.5 hour corpus, I run out of disk space (12Gigs) very fast:

OSSIAN$ du -h train/chv/speakers/news/naive_01_nn -d 1
26M     train/chv/speakers/news/naive_01_nn/time_lab
888K    train/chv/speakers/news/naive_01_nn/dnn_training_ACOUST
9.7G    train/chv/speakers/news/naive_01_nn/cmp
129M    train/chv/speakers/news/naive_01_nn/lab_dur
8.7M    train/chv/speakers/news/naive_01_nn/align_lab
8.5M    train/chv/speakers/news/naive_01_nn/dur
64M     train/chv/speakers/news/naive_01_nn/utt
253M    train/chv/speakers/news/naive_01_nn/processors
12M     train/chv/speakers/news/naive_01_nn/align_log
629M    train/chv/speakers/news/naive_01_nn/lab_dnn
11G     train/chv/speakers/news/naive_01_nn

I see most of the space is taken up under the cmp directory:

OSSIAN$ du -h train/chv/speakers/news/naive_01_nn/cmp -d 1
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_mgc_lf0_vuv_bap_199
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_norm_mgc_lf0_vuv_bap_199
4.4G    train/chv/speakers/news/naive_01_nn/cmp/binary_label_502
2.9G    train/chv/speakers/news/naive_01_nn/cmp/nn_no_silence_lab_502
4.0K    train/chv/speakers/news/naive_01_nn/cmp/nn_no_silence_lab_norm_502
9.7G    train/chv/speakers/news/naive_01_nn/cmp

So the binary_label_502 and nn_no_silence_lab_502 take up the most space under cmp.

Any work arounds?

I'm running Ossian on AWS with 16G disk space, and since the OS takes up about 4G, training crashes after I train the frontend and move on to Merlin.

Specifically, I crash after this command:

python ./tools/merlin/src/run_merlin.py /home/ubuntu/Ossian/train//chv/speakers/news/naive_01_nn/processors/acoustic_predictor/config.cfg

Thanks!

-josh

JRMeyer avatar Dec 24 '17 17:12 JRMeyer