TensorFlow-ENet
TensorFlow-ENet copied to clipboard
How to train it on another data set? how can I handle checkpoint?
Hi, kwotsin! Thanks for your work. I want to train it on another data set (class number is 30 instead of 12). I thought I had changed related codes. But I met this error: 2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000 I thought it may be caused by checkpoint? How can I deal with this problem?
The completed information is as follow:
========= Median Frequency Balancing Class Weights =========
[6.397542327061094e-05, 6.7097626201794152e-05, 0.024400273767542283, 0.041269401614453756, 5.5506352412896832e-05, 0.076635711324892844, 0.069381256179271614, 3.472654196521944e-05, 0.00042760164428717635, 0.00012440287198120186, 0.090233329139976615, 0.12489918060211183, 0.0013708685331902757, 6.0827765291491662e-05, 0.073240128809290553, 0.35775514055273316, 0.64257341685305103, 0.90968868010977944, 0.37688909228806228, 0.44248634385452756, 0.00042529101230680852, 0.30566376891079095, 0.28941152643298945, 3.9464190165066867e-05, 0.26421036878629223, 0.42250536299160169, 0.5089356784417215, 0.00024742224929701886, 0.47265314480960613, 0.0]
2018-01-11 17:22:23.528595: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528689: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528720: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:29.254935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-11 17:22:29.503633: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x1e106f80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-11 17:22:29.504523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-11 17:22:29.505315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-11 17:22:29.505448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2018-01-11 17:22:29.505491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2018-01-11 17:22:29.505540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2018-01-11 17:22:29.505685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2018-01-11 17:22:29.505705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2018-01-11 17:22:29.505740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0)
2018-01-11 17:22:29.505779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:84:00.0)
2018-01-11 17:22:34.391659: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1368 get requests, put_count=1100 evicted_count=1000 eviction_rate=0.909091 and unsatisfied allocation rate=1
2018-01-11 17:22:34.391731: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:Saving checkpoint to path ./log/original/model.ckpt
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Epoch 1/300
INFO:tensorflow:Current Learning Rate: [0.00050000002]
INFO:tensorflow:global step 1: loss: 0.3121 (4.79 sec/step) Current Streaming Accuracy: 0.0000 Current Mean IOU: 0.0000
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0000 Validation Mean IOU: 0.0000 (2.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0209 Validation Mean IOU: 0.0030 (1.10 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0028 (1.26 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0227 Validation Mean IOU: 0.0033 (1.23 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0220 Validation Mean IOU: 0.0035 (1.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0208 Validation Mean IOU: 0.0033 (1.28 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0201 Validation Mean IOU: 0.0033 (1.22 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0198 Validation Mean IOU: 0.0032 (1.25 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0031 (1.18 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.39 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.23 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.18 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0032 (1.16 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0191 Validation Mean IOU: 0.0031 (1.41 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0031 (1.26 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0195 Validation Mean IOU: 0.0032 (1.43 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.32 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0202 Validation Mean IOU: 0.0033 (1.34 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0204 Validation Mean IOU: 0.0034 (1.33 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0203 Validation Mean IOU: 0.0034 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0206 Validation Mean IOU: 0.0034 (1.36 sec/step)
2018-01-11 17:23:21.808311: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [all dims of 'image.shape' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [all dims of 'image.shape' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0035 (1.19 sec/step)
2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000
[[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/_5971, Reshape_5/shape)]]
2018-01-11 17:23:22.197319: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000
[[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/5971, Reshape_5/shape)]]
Traceback (most recent call last):
File "train_enet.py", line 340, in
have you figured out how it works? I trained on my own dataset as well, but the accuracy is so low..
heollo, @changlinzhang @kwotsin could you tell me how to use the files in the checkpoint folder as the pretrain model to train my own dataset?
hello,everyone,so how to make our data set to train? Thank you.
have you figured out how it works? I trained on my own dataset as well, but the accuracy is so low..
I made my own dataset, but I met errors below
InvalidArgumentError (see above for traceback): assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Traceback (most recent call last):
File "train_enet.py", line 337, in labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Caused by op u'mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert', defined at:
File "train_enet.py", line 337, in labels out of bound')],
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 559, in assert_less
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 118, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 157, in Assert
guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2057, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1895, in BuildCondBranch
original_result = fn()
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 155, in true_assert
condition, data, summarize, name="Assert")
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 51, in _assert
name=name)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/bayes/anaconda2/envs/py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [2]
[[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_5481, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_5483, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_5485)]]
Could you help me please
I faced same problem. In my case, I remade annotation images not including value of '255' and works. https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/107#issuecomment-325857231
@RobinHan24 I met the same problem.I have 10 classes,according my classes,I set the pixels of my label images to 0 to 9,then the problem fixed.I don't wither it is helpful for you?
thanks for this useful repo hi everyone if anyone could help me out to solve this issue
- the current code works for camvid dataset,
- am facing a difficulty in training this ENet model with cityscapes dataset : which i tried using https://github.com/mcordts/cityscapesScripts and got trained data, now i would like to import this similar data in this code but states dimension miss match, could you please help me to fix this grey scale image insertion as i have 4types(color.png,instance.png,labeld.png,json.png,trainid.png) of labeling after training the data. how to choose anyone from this folder and import in this model i tried for single type of images and got error:
InvalidArgumentError (see above for traceback): assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_1:0) = ] [12]
[[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py:3561) = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2)]]
as i am beginner to this field so, hoping for suggestions to resolve this error.