mimic2 icon indicating copy to clipboard operation
mimic2 copied to clipboard

training.py consume more resource

Open Vino-git opened this issue 6 years ago • 3 comments

Hi, After successful installation and process. When I try to execute the training.py, my system gets hang and I could not do any action unless force shutdown. Please assist for successful execution of mimic2.

Vino-git avatar Aug 25 '18 16:08 Vino-git

Below the status after completion of training.py. This training did inside the docker.

2018-08-27 15:40:09.343384: W tensorflow/core/framework/allocator.cc:101] Allocation of 18124800 exceeds 10% of system memory. 2018-08-27 15:40:09.826597: W tensorflow/core/framework/allocator.cc:101] Allocation of 77332480 exceeds 10% of system memory. 2018-08-27 15:40:10.362115: W tensorflow/core/framework/allocator.cc:101] Allocation of 16142400 exceeds 10% of system memory. 2018-08-27 15:40:10.542931: W tensorflow/core/framework/allocator.cc:101] Allocation of 16142400 exceeds 10% of system memory. 2018-08-27 15:40:10.666453: W tensorflow/core/framework/allocator.cc:101] Allocation of 17369600 exceeds 10% of system memory. 2018-08-27 15:40:10.673025: W tensorflow/core/framework/allocator.cc:101] Allocation of 17936000 exceeds 10% of system memory. 2018-08-27 15:40:10.676968: W tensorflow/core/framework/allocator.cc:101] Allocation of 18691200 exceeds 10% of system memory. 2018-08-27 15:40:10.704691: W tensorflow/core/framework/allocator.cc:101] Allocation of 17369600 exceeds 10% of system memory. 2018-08-27 15:40:10.718342: W tensorflow/core/framework/allocator.cc:101] Allocation of 17936000 exceeds 10% of system memory. 2018-08-27 15:40:10.733771: W tensorflow/core/framework/allocator.cc:101] Allocation of 19635200 exceeds 10% of system memory. 2018-08-27 15:40:10.738065: W tensorflow/core/framework/allocator.cc:101] Allocation of 18691200 exceeds 10% of system memory. 2018-08-27 15:40:10.829940: W tensorflow/core/framework/allocator.cc:101] Allocation of 19163200 exceeds 10% of system memory. 2018-08-27 15:40:14.460210: W tensorflow/core/framework/allocator.cc:101] Allocation of 17301504 exceeds 10% of system memory. 2018-08-27 15:40:14.596566: W tensorflow/core/framework/allocator.cc:101] Allocation of 34603008 exceeds 10% of system memory. Step 1 [102.949 sec/step, loss=0.96979, avg_loss=0.96979] 2018-08-27 15:40:29.697392: W tensorflow/core/framework/allocator.cc:101] Allocation of 101680000 exceeds 10% of system memory. 2018-08-27 15:40:31.111335: W tensorflow/core/framework/allocator.cc:101] Allocation of 43778048 exceeds 10% of system memory. 2018-08-27 15:40:31.122356: W tensorflow/core/framework/allocator.cc:101] Allocation of 43778048 exceeds 10% of system memory. Killed

Vino-git avatar Aug 27 '18 15:08 Vino-git

@Vino-git you may need to change the batch size to something lower. It's currently set at 32 and that may be taking up to much memory.

LearnedVector avatar Aug 29 '18 22:08 LearnedVector

sed -i 's/batch_size=32/batch_size=16/g' hparams.py

The above fixed this for me.

JasonGhent avatar May 13 '20 23:05 JasonGhent