DeepSpell
DeepSpell copied to clipboard
MemoryError
I have 64G memory + 64G swap, but...
Is it ok?
47590536
answer: 'To listen to the audio turn off the.....'
question: 'To listen to the audio turn off the.....'
47590536
answer: 'leader, who sent out fund-raising.......'
question: 'leader, who sent out fund-raising.......'
Vectorization...
X = np_zeros
Traceback (most recent call last):
File "keras_spell.py", line 302, in <module>
main_news()
File "keras_spell.py", line 296, in main_news
X_train, X_val, y_train, y_val, y_maxlen, ctable = vectorize(questions, answers, chars)
File "keras_spell.py", line 97, in vectorize
X = np_zeros((len_of_questions, x_maxlen, len(chars)), dtype=np.bool)
MemoryError
Nope. Not ok. In a later revision of the code which I have not yet released we moved to batches (new feature from Keras) that enable us to train without holding the entire data in memory. For now, limit the amount of lines read in the read_news function. Start with 10K?
The one-hot feature encoding takes a lot of memory, perhaps trying to use a sparse representation.
thanks @MajorTal for releasing the code, I replicated your results, but used CHARS to avoid the memory explosion.
CHARS? Please elaborate.
about swap... It seems swap is not used at all
Hi, thank you so much for sharing this! I've a similar problem as @vinnitu, python exit with "Killed" error, I guess for memory too... So I'm starting the training with 10K examples and began the training correctly.
Now I'm at the first Iteration (ok, is only the first one), but is it ok that kind of results at this step or there is something wrong?
--------------------------------------------------
Iteration 1
Train on 33229 samples, validate on 3693 samples
Epoch 1/5
33229/33229 [==============================] - 4233s - loss: 2.9667 - acc: 0.2423 - val_loss: 2.6717 - val_acc: 0.2999
Epoch 2/5
33229/33229 [==============================] - 4299s - loss: 2.6127 - acc: 0.3124 - val_loss: 2.5249 - val_acc: 0.3222
Epoch 3/5
33229/33229 [==============================] - 4335s - loss: 2.5537 - acc: 0.3208 - val_loss: 2.4999 - val_acc: 0.3258
Epoch 4/5
33229/33229 [==============================] - 4225s - loss: 2.5241 - acc: 0.3248 - val_loss: 2.5127 - val_acc: 0.3242
Epoch 5/5
33229/33229 [==============================] - 4073s - loss: 2.5072 - acc: 0.3270 - val_loss: 2.4975 - val_acc: 0.3263
Q English icty of Coventry in 2008, was...
A English city of Coventry in 2008, was...
X toe ee.....
---
Q He hoped leftwingers would respect his..
A He hoped leftwingers would respect his..
X The e.....
---
Q "The unexpected rebound will help to....
A "The unexpected rebound will help to....
X toe e......
---
Q straight games..........................
A straight games..........................
X tee eee...........................
---
Q 1,234 U.S. residents age 21 and older,..
A 1,234 U.S. residents age 21 and older,..
X The e.....
---
Q W edon't even know what's in the next...
A We don't even know what's in the next...
X toe ee.....
---
Q to quit.................................
A to quit.................................
X teeee...................................
---
Q to give these answers as the ploice.....
A to give these answers as the police.....
X toe e.......
---
Q Uzoh and Quinton Ross, swingmyan........
A Uzoh and Quinton Ross, swingman.........
X toe e..........
---
Q her bank accont had been a repository...
A her bank account had been a repository..
X toe ee.....
---
I've added only these patches to the code due to some error on utf-8 encoding

but I guess didn't broken anything...
I'm trying on a "poor" Mac i7, later on I'll move to AWS p2.xlarge but first of all I would like to see if everything is working properly. Thank you! Best
Luca
It actually looks ok for a first iteration. May the network be ever converging in your favor.
Ok thank you @MajorTal ! I would like to train on Italian language corpus. Do you think Google 5-gram Italian corpus could be ok to train ? It's not so easy to find millions of news corpus in Italian to train... Thank you!