char-rnn-tensorflow icon indicating copy to clipboard operation
char-rnn-tensorflow copied to clipboard

MemoryError

Open ghost opened this issue 8 years ago • 4 comments

When training on large files, I get a MemoryError despite having more than enough memory to hold the file:

reading text file Traceback (most recent call last): File "train.py", line 111, in main() File "train.py", line 48, in main train(args) File "train.py", line 51, in train data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length) File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 18, in init self.preprocess(input_file, vocab_file, tensor_file) File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 35, in preprocess self.tensor = np.array(list(map(self.vocab.get, data))) MemoryError

ghost avatar Jun 05 '16 01:06 ghost

Happens to me too, the current implementation needs about 20 times as RAM as the size of the input file. 500MB of input train fine using something in the neighborhood of 10GB of RAM

izqui avatar Jun 08 '16 08:06 izqui

Thanks for the report. @Alicemargatroid @izqui. I need some help figuring out the right way to fix this problem.

How big is the data.npy file? Is it 20 times large as well?

Should we optimize the DS or switch to a streaming loader?

sherjilozair avatar Jun 09 '16 12:06 sherjilozair

@sherjilozair I think a streaming loader would be best

ghost avatar Jun 09 '16 12:06 ghost

@sherjilozair Right now char-rnn is using 13.54 GB of RAM and this is the size of the data files:

-rw-r--r-- 1 root root 6254212000 Jun  6 16:50 data.npy
-rw-r--r-- 1 root root  781776490 Jun  6 16:22 input.txt
-rw-r--r-- 1 root root       1357 Jun  6 16:47 vocab.pkl

izqui avatar Jun 09 '16 14:06 izqui