flownet2-tf process killed during training

process killed during training

Open swq-1993 opened this issue 7 years ago • 0 comments

  Thank you very much for your excellent work! I met some problem when i use the dataset to training the model;
  the output message as:
 /flownet2-tf$ python -m src.flownet_s.train

2017-10-31 11:24:27.425738: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.425771: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.425793: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.425811: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.425829: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX512F instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.425847: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-10-31 11:24:27.646726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: Quadro P5000 major: 6 minor: 1 memoryClockRate (GHz) 1.7335 pciBusID 0000:65:00.0 Total memory: 15.89GiB Free memory: 15.47GiB 2017-10-31 11:24:27.646762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-10-31 11:24:27.646780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-10-31 11:24:27.646793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro P5000, pci bus id: 0000:65:00.0) Killed

   Although it generate a model, I thought the process of training was killed, and it not finished training the whole dataset.
   I tried to reduce the BATCH_SIZE to 2, but the result was same, was killed.
   Did you met this situation, and how do I fix it?
   Wishing for your reply, and thank you very much for your contribution!

Oct 31 '17 04:10 swq-1993

flownet2-tf flownet2-tf copied to clipboard

process killed during training

flownet2-tf
flownet2-tf copied to clipboard