SIN
SIN copied to clipboard
"OOM when allocating tensor" During Test
there is error when i run 'train.sh' or 'test.sh', as bellow: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[65536,8192]
During training, i solve this problem by change 'config.gpu_options.per_process_gpu_memory_fraction = 0.7' (0.7 to 0.9) However, During test, this error appear again, i don't know how to solve this. can you give some suggestion
the configuration of my computer: ubuntu16.04 tensorflow1.3.0 GPU:gtx 1080 ti (with memory of 11172 MB)
Maybe you need to change the 'config.gpu_options.per_process_gpu_memory_fraction = 0.45' in line 94 of tools/test_net.py
@choasup
sorry to bother you, i have a problem during training
i set config.gpu_options.per_process_gpu_memory_fraction = 0.95
and two gtx 1080 ( 8105MiB each ) are used while training( i checked through nvidia-smi )
but i still encounter the OOM problem
can you give me some suggestions to solve this problem?
I am sorry that the memory of 8105MB is very small...maybe you can set smaller batchsize of RPN
when you test ,you need change config.gpu_options.per_process_gpu_memory_fraction = 0.45 to 0.7. I solve this problem by this means.
@choasup sorry to bother you, i have a problem during training i set config.gpu_options.per_process_gpu_memory_fraction = 0.95 and two gtx 1080 ( 8105MiB each ) are used while training( i checked through nvidia-smi ) but i still encounter the OOM problem can you give me some suggestions to solve this problem?
You can run this SIN code on multi-GPU? Right? Could you tell me What changes needs to be made? I also want to run it on multi GPU~
change "config.gpu_options.per_process_gpu_memory_fraction = 0.7" to config.gpu_options.allow_growth = True
i read the SIN paper and found that In the model in this paper, ellipses indicate that there are multiple GRU connections. But i read the code network.py and find n_steps = 2. i encounter the OOM problem when i change it to 3 or more, Is it means more than 2 GRU needs more memory? sincerely hope for your reply.
excuse me, I'm running this model recently. I encountered this error during the training process: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16384,8192] [[Node: concat_5 = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape_9, Reshape_10, concat_5/axis)]] [[Node: Mean_5/_215 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_8841_Mean_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]] How did you change it? I need your answer urgently, thank you very much @choasup @hydk0420