CSP icon indicating copy to clipboard operation
CSP copied to clipboard

I am stuck here. Instructions for updating: Use tf.cast instead.

Open prolulu opened this issue 5 years ago • 24 comments

Using TensorFlow backend. num of training samples: 1 WARNING:tensorflow:From /home/lulu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/lulu/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:2893: calling l2_normalize (from tensorflow.python.ops.nn_impl) with dim is deprecated and will be removed in a future version. Instructions for updating: dim is deprecated, use axis instead load weights from data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5 Starting training with lr 0.0002 and alpha 0.999 Epoch 1/150 WARNING:tensorflow:From /home/lulu/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead.

prolulu avatar Jun 12 '19 10:06 prolulu

@hdjsjyl @liuwei16 @jhb86253817 Excuse me, I have the same problem on both my computer and the server.Can you tell me why?Thank you very much!

prolulu avatar Jun 12 '19 10:06 prolulu

@prolulu please tell us the version of your tensorflow and keras.

jhb86253817 avatar Jun 12 '19 18:06 jhb86253817

keras ==2.0.6 tensorflow==1.13.1 opencv==3.4.6 Thank you very much for answering my question!

prolulu avatar Jun 13 '19 01:06 prolulu

@prolulu By saying "stuck", you mean the process just stop there, neither reporting any error nor terminates, right? So you terminate the process manually?

jhb86253817 avatar Jun 13 '19 06:06 jhb86253817

Yes.By the way I don't have GPU and cuda.

prolulu avatar Jun 13 '19 08:06 prolulu

I just tried running with CPU by setting C.gpu_ids = '', and it works. So I guess the capability of your machine might be weak which makes it slow. Try waiting for longer time, it may print something.

jhb86253817 avatar Jun 13 '19 09:06 jhb86253817

@jhb86253817 I appreciate you can try it for me. Now I have new question.LOL InvalidArgumentError (see above for traceback): Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 1) and num_split 4 [[Node: split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:CPU:0"](center_cls_2/concat/axis, _arg_input_1_0_5)]]

prolulu avatar Jun 14 '19 07:06 prolulu

This looks like a problem from Tensorflow, I also encounter this when I train on GPU, however it only prints such information without terminating my training. If your process is terminated, then I have no idea, maybe try switching Tensorflow to version 1.4.1

jhb86253817 avatar Jun 14 '19 09:06 jhb86253817

@jhb86253817 Which tensorflow did you install 1.4.1 or 1.4.1-gpu? And What version is your cuda?I have already used tensorflow 1.4.1 without gpu but do not install cuda in previous trials.So I think the problem is from cuda or gpu?

prolulu avatar Jun 14 '19 09:06 prolulu

I installed 1.4.1-gpu version, cuda=9.0. I just ran on another machine with only CPU version of Tensorflow 1.13.1, and it works. By the way, I use Anaconda3 for environment setting.

jhb86253817 avatar Jun 14 '19 09:06 jhb86253817

@jhb86253817 So you used python 3 ? And train_data = cPickle.load(fid) will be change to
fid = pickle._Unpickler(fid) fid.encoding = 'latin1'
train_data = fid.load() is it?

prolulu avatar Jun 14 '19 13:06 prolulu

No, I use python2.7, conda create -n xxx python=2.7

jhb86253817 avatar Jun 14 '19 14:06 jhb86253817

Using TensorFlow backend. num of training samples: 2975 WARNING:tensorflow:From /work/dependence/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:2893: calling l2_normalize (from tensorflow.python.ops.nn_impl) with dim is deprecated and will be removed in a future version. Instructions for updating: dim is deprecated, use axis instead 2019-06-15 12:48:40.496090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2019-06-15 12:48:45.491499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:3d:00.0 totalMemory: 10.91GiB freeMemory: 10.50GiB 2019-06-15 12:48:45.934914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:3f:00.0 totalMemory: 10.91GiB freeMemory: 10.49GiB 2019-06-15 12:48:46.384828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:40:00.0 totalMemory: 10.91GiB freeMemory: 10.50GiB 2019-06-15 12:48:46.836994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 3 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:43:00.0 totalMemory: 10.91GiB freeMemory: 10.50GiB 2019-06-15 12:48:46.837858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3 2019-06-15 12:48:48.202353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-15 12:48:48.202421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3 2019-06-15 12:48:48.202435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y Y Y 2019-06-15 12:48:48.202446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N Y Y 2019-06-15 12:48:48.202456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: Y Y N Y 2019-06-15 12:48:48.202543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: Y Y Y N 2019-06-15 12:48:48.203440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10157 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:3d:00.0, compute capability: 6.1) 2019-06-15 12:48:48.308365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10146 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:3f:00.0, compute capability: 6.1) 2019-06-15 12:48:48.412299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10153 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:40:00.0, compute capability: 6.1) 2019-06-15 12:48:48.516352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10153 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:43:00.0, compute capability: 6.1) load weights from data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5 WARNING:tensorflow:From /work/dependence/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:1299: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead Starting training with lr 0.0002 and alpha 0.999 Epoch 1/150

Now I run it in server with conda create py27,and it said tensorflow 1.4.1-gpu is not suitable for CUDA 9.0.So I installed tensorflow 1.10.0.And I mean I have done all that I can do,but It don't output anything.Could you should show your train_city.py and other code that you had modified?THANK YOU VERY MUCH

prolulu avatar Jun 15 '19 05:06 prolulu

@jhb86253817 In addition,Do you modify img_channel_mean in config.py?

prolulu avatar Jun 15 '19 05:06 prolulu

I did not modify the mean in config.py. Your training log looks quite similar to mine, and after about 18 minutes, my process further prints the following: 20/250 [=>............................] - ETA: 11184s - cls: 0.0361 - regr_h: 0.0999 - offset: 0.0601

I only added "os.environ['KERAS_BACKEND'] = 'tensorflow'" in train_city.py. I also comment the try-except in get_data() in data_generator.py so that the process can be terminated to show error info if there is something wrong with image paths.

jhb86253817 avatar Jun 15 '19 09:06 jhb86253817

Use py27, tf-gpu 1.4.1, cuda 8.0, cudnn 6.0

bomtorazek avatar Aug 14 '19 11:08 bomtorazek

Have you solved this issue?

hudson-dev avatar Nov 01 '20 23:11 hudson-dev

I have same issue.....I use python 3.8.6, tf 2.3.1, cuda 10.1

madfalc0n avatar Nov 04 '20 16:11 madfalc0n

Hello, I had the same problem. I have solved the problem. My problem was I had entered wrong train image path. This made wrong train.record file, which is depend on image path. So, I can suggest you to re-check the all the path value you provided and also the record file.

NAHIN-JZS avatar Dec 19 '21 16:12 NAHIN-JZS

Can you please explain more ?

d7d7110 avatar Dec 06 '22 12:12 d7d7110

@d7d7110 Sure. But I'm afraid, it's been a long time since I solved the problem. Anyway, let me try. While I had created the record file for the training image, the image path was wrong. So, during training, it cannot find the images and wait for images. So, please ensure that you have created the record file with the correct image path.

NAHIN-JZS avatar Dec 06 '22 16:12 NAHIN-JZS

Hi, Iam getting this trouble while training in Tensor Flow, anyone can help?

Instructions for updating: Use tf.cast instead. W0319 19:16:57.031892 14868 deprecation.py:350] From C:\Users\grego\OneDrive\Documents\Last_H\TFODCourse\tfod\lib\site-packages\tensorflow\python\util\dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2023-03-19 19:17:00.646393: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.

Terepashi avatar Mar 19 '23 13:03 Terepashi

@hdjsjyl @liuwei16 @jhb86253817 Excuse me, I have the same problem on my computer. I don't know why. COuld you tell why this problem appears? Thank you very much

Terepashi avatar Mar 19 '23 13:03 Terepashi

@Terepashi Hey. Check the .record files here: ..\TFODCourse\Tensorflow\workspace\annotations. If they are empty (zero size) there is wrong path is used on step 3.

Solution: Just move train and test folders with images from ..\TFODCourse\Tensorflow\workspace\images\collectedimages to ..\TFODCourse\Tensorflow\workspace\images.

Works for me.

k5nik avatar May 21 '23 14:05 k5nik