mimic2
mimic2 copied to clipboard
training.py consume more resource
Hi, After successful installation and process. When I try to execute the training.py, my system gets hang and I could not do any action unless force shutdown. Please assist for successful execution of mimic2.
Below the status after completion of training.py. This training did inside the docker.
2018-08-27 15:40:09.343384: W tensorflow/core/framework/allocator.cc:101] Allocation of 18124800 exceeds 10% of system memory. 2018-08-27 15:40:09.826597: W tensorflow/core/framework/allocator.cc:101] Allocation of 77332480 exceeds 10% of system memory. 2018-08-27 15:40:10.362115: W tensorflow/core/framework/allocator.cc:101] Allocation of 16142400 exceeds 10% of system memory. 2018-08-27 15:40:10.542931: W tensorflow/core/framework/allocator.cc:101] Allocation of 16142400 exceeds 10% of system memory. 2018-08-27 15:40:10.666453: W tensorflow/core/framework/allocator.cc:101] Allocation of 17369600 exceeds 10% of system memory. 2018-08-27 15:40:10.673025: W tensorflow/core/framework/allocator.cc:101] Allocation of 17936000 exceeds 10% of system memory. 2018-08-27 15:40:10.676968: W tensorflow/core/framework/allocator.cc:101] Allocation of 18691200 exceeds 10% of system memory. 2018-08-27 15:40:10.704691: W tensorflow/core/framework/allocator.cc:101] Allocation of 17369600 exceeds 10% of system memory. 2018-08-27 15:40:10.718342: W tensorflow/core/framework/allocator.cc:101] Allocation of 17936000 exceeds 10% of system memory. 2018-08-27 15:40:10.733771: W tensorflow/core/framework/allocator.cc:101] Allocation of 19635200 exceeds 10% of system memory. 2018-08-27 15:40:10.738065: W tensorflow/core/framework/allocator.cc:101] Allocation of 18691200 exceeds 10% of system memory. 2018-08-27 15:40:10.829940: W tensorflow/core/framework/allocator.cc:101] Allocation of 19163200 exceeds 10% of system memory. 2018-08-27 15:40:14.460210: W tensorflow/core/framework/allocator.cc:101] Allocation of 17301504 exceeds 10% of system memory. 2018-08-27 15:40:14.596566: W tensorflow/core/framework/allocator.cc:101] Allocation of 34603008 exceeds 10% of system memory. Step 1 [102.949 sec/step, loss=0.96979, avg_loss=0.96979] 2018-08-27 15:40:29.697392: W tensorflow/core/framework/allocator.cc:101] Allocation of 101680000 exceeds 10% of system memory. 2018-08-27 15:40:31.111335: W tensorflow/core/framework/allocator.cc:101] Allocation of 43778048 exceeds 10% of system memory. 2018-08-27 15:40:31.122356: W tensorflow/core/framework/allocator.cc:101] Allocation of 43778048 exceeds 10% of system memory. Killed
@Vino-git you may need to change the batch size to something lower. It's currently set at 32 and that may be taking up to much memory.
sed -i 's/batch_size=32/batch_size=16/g' hparams.py
The above fixed this for me.