pdnn
pdnn copied to clipboard
memory error
hi everyone, I use copy-feats binary from kaldi, to convert my ascii features in .ark and .scp Then I copied all the independent .scp files into a unique one which I called SmallSet0.scp:
SESS0003BLOCKA_06 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_06.ark:18 SESS0003BLOCKA_07 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_07.ark:18 SESS0003BLOCKA_08 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_08.ark:18 SESS0003BLOCKA_09 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_09.ark:18 SESS0003BLOCKA_10 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_10.ark:18 SESS0003BLOCKA_11 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_11.ark:18
Then I tryed to train 4 stacked RBM using run_RBM.py and got the following memory error:
ana@ana-HP-EliteBook-Folio-9470m:~/PDNN/pdnn$ python /home/ana/PDNN/pdnn/cmds/run_RBM.py --train-data "/home/ana/DB/SmallSet0/feat/SmallSet0.scp,partition=600m,stream=true,random=true" --nnet-spec "215:1024:1024:43:1024" --wdir ./ --ptr-layer-number 4 --epoch-number 10 --batch-size 128 --learning-rate 0.08 --gbrbm-learning-rate 0.005 --momentum 0.5:0.9:5 --first_layer_type gb --param-output-file /home/ana/PDNN/Working_dir/rbm.mdl
[2015-07-28 23:06:57.528732] > ... initializing the model
Traceback (most recent call last):
File "/home/ana/PDNN/pdnn/cmds/run_RBM.py", line 62, in
what did I do wrong? best regards ana
Hi, I am also getting the same memory error. The minimal python code is:
from io_func.kaldi_feat import KaldiReadIn
in_scp_file = '/data/raw_mfcc_test.1.scp'
kaldiread = KaldiReadIn(in_scp_file)
utt_number = 0
while True:
uttid, in_matrix = kaldiread.read_next_utt()
if uttid == '':
break
On debugging, I found that in kaldi_feat.py, the following lines:
m, rows = struct.unpack('<bi', ark_read_buffer.read(5))
n, cols = struct.unpack('<bi', ark_read_buffer.read(5))
give rows and cols to be extremely large numbers. The next line of uses rows*cols to form a numpy array, and hence raises the error.
I am using mac OS X Yosemite 10.10.3
Sincerely, -Vipul
Hi, I also have this problem while I was trying to train a simple digits speech recognition by using DNN. After I got the mfcc features from Kaldi in .scp format, I was trying to use the command below:
run_DNN.py --train-data "./mfcc/raw_mfcc_train.1.scp,partition=600m,random=true" \
--valid-data "./mfcc/raw_mfcc_test.1.scp,partition=600m,random=true" \
--nnet-spec "250:1024:1024:1024:1024:1024:10" --wdir ./ \
--output-format kaldi \
--lrate "D:0.08:0.5:0.05,0.05:15" \
--output-file dnn.nnet >& dnn.training.log
But I got the error in log file:
Traceback (most recent call last):
File "/home/cssp/pdnn-master/cmds/run_DNN.py", line 56, in <module>
cfg.init_data_reading(train_data_spec, valid_data_spec)
File "/home/cssp/pdnn-master/utils/network_config.py", line 94, in init_data_reading
self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
File "/home/cssp/pdnn-master/io_func/data_io.py", line 92, in read_dataset
data_reader.initialize_read(first_time_reading = True)
File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 102, in initialize_read
utt_id, utt_mat = self.read_next_utt()
File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 89, in read_next_utt
tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
MemoryError
Did anyone has solutions for this? Thanks, -a00a