pdnn icon indicating copy to clipboard operation
pdnn copied to clipboard

memory error

Open anarucu opened this issue 9 years ago • 2 comments

hi everyone, I use copy-feats binary from kaldi, to convert my ascii features in .ark and .scp Then I copied all the independent .scp files into a unique one which I called SmallSet0.scp:

SESS0003BLOCKA_06 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_06.ark:18 SESS0003BLOCKA_07 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_07.ark:18 SESS0003BLOCKA_08 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_08.ark:18 SESS0003BLOCKA_09 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_09.ark:18 SESS0003BLOCKA_10 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_10.ark:18 SESS0003BLOCKA_11 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_11.ark:18

Then I tryed to train 4 stacked RBM using run_RBM.py and got the following memory error:

ana@ana-HP-EliteBook-Folio-9470m:~/PDNN/pdnn$ python /home/ana/PDNN/pdnn/cmds/run_RBM.py --train-data "/home/ana/DB/SmallSet0/feat/SmallSet0.scp,partition=600m,stream=true,random=true" --nnet-spec "215:1024:1024:43:1024" --wdir ./ --ptr-layer-number 4 --epoch-number 10 --batch-size 128 --learning-rate 0.08 --gbrbm-learning-rate 0.005 --momentum 0.5:0.9:5 --first_layer_type gb --param-output-file /home/ana/PDNN/Working_dir/rbm.mdl [2015-07-28 23:06:57.528732] > ... initializing the model Traceback (most recent call last): File "/home/ana/PDNN/pdnn/cmds/run_RBM.py", line 62, in cfg.init_data_reading(train_data_spec) File "/home/ana/PDNN/pdnn/utils/rbm_config.py", line 65, in init_data_reading self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args) File "/home/ana/PDNN/pdnn/io_func/data_io.py", line 92, in read_dataset data_reader.initialize_read(first_time_reading = True) File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 102, in initialize_read utt_id, utt_mat = self.read_next_utt() File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 89, in read_next_utt tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32) MemoryError

what did I do wrong? best regards ana

anarucu avatar Jul 28 '15 21:07 anarucu

Hi, I am also getting the same memory error. The minimal python code is:

from io_func.kaldi_feat import KaldiReadIn
in_scp_file = '/data/raw_mfcc_test.1.scp'
kaldiread = KaldiReadIn(in_scp_file)
utt_number = 0
while True:
    uttid, in_matrix = kaldiread.read_next_utt()
    if uttid == '':
        break

On debugging, I found that in kaldi_feat.py, the following lines:

m, rows = struct.unpack('<bi', ark_read_buffer.read(5))
n, cols = struct.unpack('<bi', ark_read_buffer.read(5))

give rows and cols to be extremely large numbers. The next line of uses rows*cols to form a numpy array, and hence raises the error.

I am using mac OS X Yosemite 10.10.3

Sincerely, -Vipul

vipular avatar Oct 12 '15 13:10 vipular

Hi, I also have this problem while I was trying to train a simple digits speech recognition by using DNN. After I got the mfcc features from Kaldi in .scp format, I was trying to use the command below:

run_DNN.py --train-data "./mfcc/raw_mfcc_train.1.scp,partition=600m,random=true" \
           --valid-data "./mfcc/raw_mfcc_test.1.scp,partition=600m,random=true" \
           --nnet-spec "250:1024:1024:1024:1024:1024:10" --wdir ./ \
           --output-format kaldi \
           --lrate "D:0.08:0.5:0.05,0.05:15" \
           --output-file dnn.nnet >& dnn.training.log

But I got the error in log file:

Traceback (most recent call last):
  File "/home/cssp/pdnn-master/cmds/run_DNN.py", line 56, in <module>
    cfg.init_data_reading(train_data_spec, valid_data_spec)
  File "/home/cssp/pdnn-master/utils/network_config.py", line 94, in init_data_reading
    self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
  File "/home/cssp/pdnn-master/io_func/data_io.py", line 92, in read_dataset
    data_reader.initialize_read(first_time_reading = True)
  File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 102, in initialize_read
    utt_id, utt_mat = self.read_next_utt()
  File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 89, in read_next_utt
    tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
MemoryError

Did anyone has solutions for this? Thanks, -a00a

a00achild1 avatar Oct 31 '16 07:10 a00achild1