anago icon indicating copy to clipboard operation
anago copied to clipboard

always killed by OS

Open parkourcx opened this issue 5 years ago • 2 comments

batch size 64, killed by OS after 123 steps

parkourcx avatar Sep 19 '19 09:09 parkourcx

run Elmo example, using my own data, which is formatted the same as the example data("word[tab]tag" each line) training file: about 130mb; training batch_size: tried from 32 to 512; training epoch:1; elmo model: my own trained Elmo ; own Elmo options file: {"lstm": {"use_skip_connections": true, "projection_dim": 512, "cell_clip": 3, "proj_clip": 3, "dim": 4096, "n_layers": 2}, "char_cnn": {"activation": "relu", "filters": [[1, 32], [2, 32], [3, 64], [4, 128], [5, 256], [6, 512], [7, 1024]], "n_highway": 2, "embedding": {"dim": 16}, "n_characters": 262, "max_characters_per_token": 50}}; other training option: is set to default;

OS: ubuntu 18.04; keras: 2.2.4; tensorflow-gpu: 1.13.1 GPU:Nvidia 1080 Ti (12GB Mem) RAM: 128GB

The situation is : after training some steps(according to batch_size), program will be killed by system, but I don't see a system or GPU memory leak. The question is: how did that happen? What did I do wrong? Is it my batch_size set too much or my training data too big? Someone HELP!!!

parkourcx avatar Sep 20 '19 04:09 parkourcx

but I do see a lot of process going on

parkourcx avatar Sep 20 '19 12:09 parkourcx