keras-yolo3
keras-yolo3 copied to clipboard
Conv implementation does not support grouped convolutions for now. GPU / CPU
Hi, I am trying to train locally on my Mac without a GPU. Thank you Experiencor, I can train successfully on the raccoon dataset.
This is my config.json file:
`{ "model" : { "min_input_size": 1000, "max_input_size": 1920, "anchors": [21,68, 27,83, 63,64, 77,77, 96,87, 111,101, 161,98, 178,131, 209,110], "labels": ["1", "2", "3", "4"] },
"train": {
"train_image_folder": "/Users/moritz.b/Desktop/keras-yolo3-master/yolo/",
"train_annot_folder": "/Users/moritz.b/Desktop/keras-yolo3-master/yolo/outputs/",
"cache_name": "localizer.pkl",
"train_times": 6,
"batch_size": 2,
"learning_rate": 1e-4,
"nb_epochs": 10,
"warmup_epochs": 2,
"ignore_thresh": 0.5,
"gpus": "0",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"class_scale": 1,
"tensorboard_dir": "logs",
"saved_weights_name": "localisations.h5",
"debug": true
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"cache_name": "",
"valid_times": 1
}
}`
When I start training, the error message says:
Traceback (most recent call last): File "train.py", line 290, in <module> _main_(args) File "train.py", line 267, in _main_ max_queue_size = 8 File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/keras/engine/training.py", line 1732, in fit_generator initial_epoch=initial_epoch) File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/keras/engine/training_generator.py", line 220, in fit_generator reset_metrics=False) File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/keras/engine/training.py", line 1514, in train_on_batch outputs = self.train_function(ins) File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__ run_metadata=self.run_metadata) File "/Users/moritz.b/opt/anaconda3/envs/yolo3/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__ run_metadata_ptr) tensorflow.python.framework.errors_impl.UnimplementedError: Fused conv implementation does not support grouped convolutions for now. [[{{node conv_81/BiasAdd}}]]
Does anybody have an idea?
Im getting the same error when training with zoo/config_rbc.json
On Google Colab it works fine with the same environment, It seems to be a hardware related issue.
I have replicated your errors on Ubuntu18.04 on a GTX1660. According to this thread, grouped convolutions may not be supported on CPU. I don't know if that is still the case for 1.15, but I am getting your errors when I switch to CPU. You might try the suggestion on this issue. https://github.com/tensorflow/tensorflow/issues/29005#issuecomment-569554164
However when I use GPU I'm still getting errors at that same layer: tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: No algorithm worked! [[{{node conv_81/convolution}}]] [[loss/Identity_1/_3877]] (1) Not found: No algorithm worked! [[{{node conv_81/convolution}}]]
Would you mind sharing the google colab that worked?
Having same issues on Colab and on CPU. It errors on multiple classes. Output on Colab -

On Google Colab it works fine with the same environment, It seems to be a hardware related issue.
I'm having the same issue. I tried to run this model about 'train.py', but it errors on multiple classes. How can you solve this problem?
Having same issues on Colab and on CPU. It errors on multiple classes. Output on Colab -
I'm having the same issue, too. Did you solve this problem?
@YoonSungLee: Yes. I was able to solve the problem. I had to use a different cache_name for the data in config.json. This error occurs because you have updated the output layer to accommodate new classes, but the pickle file created uses old class list.
Hope that helps.
@YoonSungLee: Yes. I was able to solve the problem. I had to use a different
cache_namefor the data inconfig.json. This error occurs because you have updated the output layer to accommodate new classes, but the pickle file created uses old class list.Hope that helps.
Wow, thank you very much! By doing so, I'm able to get over that error. But I have another error, now... Do you happen to know how to solve this problem?
The error is as follows:
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /content/gdrive/My Drive/Project/YOLO v3/keras-yolo3/yolo.py:26: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /content/gdrive/My Drive/Project/YOLO v3/keras-yolo3/yolo.py:151: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
Loading pretrained weights.
Traceback (most recent call last):
File "train.py", line 295, in
@YoonSungLee Do you have backend.h5 in the same location of config.json? It is pretrained weights for the model.
@YoonSungLee Do you have
backend.h5in the same location ofconfig.json? It is pretrained weights for the model.
Wow, by changing the h5 file, I'm able to get over the error. But I have another error, again. I'm very exhausted. Could you help me? The error is as following:
Loading pretrained weights.
/usr/local/lib/python3.6/dist-packages/keras/callbacks/callbacks.py:998: UserWarning: epsilon argument is deprecated and will be removed, use min_delta instead.
warnings.warn('epsilon argument is deprecated and '
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:431: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:438: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/callbacks/tensorboard_v1.py:200: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/callbacks/tensorboard_v1.py:203: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
Epoch 1/100 src/tcmalloc.cc:283] Attempt to free invalid pointer 0x696e69617274160a