Hi,
I use the train.py to train our data. However, it shows no memory. Which parameter should I change? Our data size is 6000*4000.
2017-08-28 21:55:18.073583: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 10.55GiB
2017-08-28 21:55:18.073595: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 11332606362
InUse: 11332592128
MaxInUse: 11332592128
NumAllocs: 275
MaxAllocSize: 8108309248
2017-08-28 21:55:18.073630: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***********************************************************************************xxxxxxxxxxxxxxxxx
2017-08-28 21:55:18.073655: W tensorflow/core/framework/op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[24000000,2]
Traceback (most recent call last):
File "train.py", line 131, in
tf.app.run()
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 127, in main
train.do_training(hypes)
File "incl/tensorvision/train.py", line 396, in do_training
run_training(hypes, modules, tv_graph, tv_sess)
File "incl/tensorvision/train.py", line 245, in run_training
feed_dict=feed_dict)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,3,4000,6000]
[[Node: conv1_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Processing/concat, conv1_1/filter/read)]]
[[Node: Loss/loss/add_1/_27 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1956_Loss/loss/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'conv1_1/Conv2D', defined at:
File "train.py", line 131, in
tf.app.run()
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 127, in main
train.do_training(hypes)
File "incl/tensorvision/train.py", line 377, in do_training
Aug 28
'17 22:08
qoo
Hi @qoo . Can you please explain how did you train the KittiSeg Model on your own data. As in where the training and validation data should be kept ?