TensorFlow-Examples icon indicating copy to clipboard operation
TensorFlow-Examples copied to clipboard

Dst tensor is not initialized

Open burness opened this issue 8 years ago • 37 comments

Hi @aymericdamien I have run this script logistic regression.py But i met a problem "Dst tensor is not initialized". The detail log is here:

Epoch: 0001 cost= 29.917553501
Epoch: 0002 cost= 21.929896693
Epoch: 0003 cost= 21.063875407
Epoch: 0004 cost= 20.457020144
Epoch: 0005 cost= 20.084428289
Epoch: 0006 cost= 19.814794980
Epoch: 0007 cost= 19.674670629
Epoch: 0008 cost= 19.510438999
Epoch: 0009 cost= 19.309689613
Epoch: 0010 cost= 19.223995275
Epoch: 0011 cost= 19.161345129
Epoch: 0012 cost= 18.985856709
Epoch: 0013 cost= 18.917688493
Epoch: 0014 cost= 18.832972273
Epoch: 0015 cost= 18.742634454
Epoch: 0016 cost= 18.695894625
Epoch: 0017 cost= 18.643278683
Epoch: 0018 cost= 18.609112186
Epoch: 0019 cost= 18.444614899
Epoch: 0020 cost= 18.532375607
Epoch: 0021 cost= 18.437554449
Epoch: 0022 cost= 18.310914770
Epoch: 0023 cost= 18.289282742
Epoch: 0024 cost= 18.214274961
Epoch: 0025 cost= 18.293197173
Optimization Finished!
Accuracy:
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-17-f661f1e1e9de> in <module>()
     24     # Calculate accuracy
     25     accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
---> 26     print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in eval(self, feed_dict, session)
    500 
    501     """
--> 502     return _eval_using_default_session(self, feed_dict, self.graph, session)
    503 
    504 

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in _eval_using_default_session(tensors, feed_dict, graph, session)
   3332                        "the tensor's graph is different from the session's "
   3333                        "graph.")
-> 3334   return session.run(tensors, feed_dict)
   3335 
   3336 

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
    657       # pylint: disable=protected-access
    658       raise errors._make_specific_exception(node_def, op, error_message,
--> 659                                             e.code)
    660       # pylint: enable=protected-access
    661 

InternalError: Dst tensor is not initialized.
     [[Node: _recv_Placeholder_1_0/_27513 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_267__recv_Placeholder_1_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
     [[Node: Mean_6/_27517 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_277_Mean_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

burness avatar Jun 05 '16 03:06 burness

Are you using GPU? Usually, this error raises when GPU memory is full.

aymericdamien avatar Jun 05 '16 04:06 aymericdamien

@aymericdamien Thanks! I found the reason : I use ipython notebook to run the code , but i forget to close another one , the script and it waster too much memory

burness avatar Jun 06 '16 12:06 burness

Yup, GPU Memory Full is the reason. IPython kernels stuck in background processes does that.

Thanks, Subodh thesubodh.com

subodhp avatar Sep 27 '16 07:09 subodhp

@burness @subodhp I'm getting the same error ("Ran out of memory")
[MacbookPro 2013 with 16 GM RAM, GPU (2GB RAM), TensorFlow 0.11, CUDA 8.0, CUDNN 5.x]

I tried shutting down the Jupyter Notebook and restarting it... but it crashed with the same error. Is this solved? How does one resolve GPU memory full errors?

Thanks!

`I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 256 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 31488 totalling 30.8KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 46609152 totalling 44.45MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 44.48MiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 57622528 InUse: 46643200 MaxInUse: 46643200 NumAllocs: 11 MaxAllocSize: 46609152

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ********************************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 390.6KiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: Reshape_1/_2__cf__2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,10] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]] `

laventura avatar Nov 03 '16 18:11 laventura

I rebooted my Macbook and started afresh. system: [MacbookPro 2013, with 16 GB RAM, GPU with 2GB RAM; Tensor Flow 0.11, CUDA 8.0, CUDNN 5.x] Here's the error I get (see attached error-tf.txt at the bottom for all detail).

  1. How is the free memory only 20.49 MiB (on a recently rebooted system) if there's 2.0 GiB available to the GPU?
  2. Is there a way to track GPU memory usage?
  3. Is there a way to disable GPU usage for an iPython notebook?

Thanks!

Some relevant parts I see:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: name: GeForce GT 750M major: 3 minor: 0 memoryClockRate (GHz) 0.9255 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 20.49MiB ...

I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 21487616 InUse: 33792 MaxInUse: 65280 NumAllocs: 9 MaxAllocSize: 31488

W

tensorflow/core/common_runtime/bfc_allocator.cc:270] *___________________________________________________________________________________________________ W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 29.91MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: Const = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,784] values: -0.5 -0.49607843 -0.5...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]

error-tf.txt

laventura avatar Nov 03 '16 18:11 laventura

laventura, did you ever find a solution to the gpu out of memory error? I have the same problem with the same setup. Though I got an error trying to allocate 10.8Mib

pumplerod avatar Dec 07 '16 22:12 pumplerod

@pumplerod - I found a solution / kludge that somehow seems to work, although I can't explain why / how.

Before starting your Jupyter notebook / tensorflow program, set this:

export CUDA_VISIBLE_DEVICES=1

This seems to work in that the scripts work OK. Not sure if this is a requirement. Give it a try and see.

laventura avatar Dec 07 '16 22:12 laventura

Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))

However when I tried to run training it crashed the jupyter notebook.

pumplerod avatar Dec 07 '16 22:12 pumplerod

@pumplerod

Oh yours is very helpful for me. I got an error message about only 29Mib out of memory. I added your code with fraction 0.8 since there was 80% free memory (from 2GiB, 1.6GiB was free). My code started working. After that, I deleted ALL GPU options and this still works. very curious..

sapeyes avatar Dec 16 '16 03:12 sapeyes

Update on this:

Earlier -- the GPU was being recognized by an older TensorFlow. Now, when I upgraded TF to 0.11rc2 and later to 0.12

Now, my TF does not recognize any GPU at all.

Also, the deviceQuery does not report any GPU either. I'm going totally bonkers in this CUDA hell.

See details here: https://github.com/tensorflow/tensorflow/issues/2882

Also on NVIDIA Devtalk, if any one has any bright insights - would be very helpful to me! https://devtalk.nvidia.com/default/topic/990015/cuda-setup-and-installation/help-cuda-7-5-or-8-devicequery-failing-not-working-on-macbookpro-2013-os-x-10-11-gt750m/

laventura avatar Jan 25 '17 21:01 laventura

Just stumbled upon this thread. I think you have hidden your GPU from the CUDA drivers with this line:

export CUDA_VISIBLE_DEVICES=1

What this is telling CUDA is that it should only use "Device 1" in your system. So, unless you have 2 GPU devices, you have hidden the primary "Device 0". I am sure if you set this as follows TF will see your GPU again, but your other problems may return:

export CUDA_VISIBLE_DEVICES=0

Mazecreator avatar Jan 25 '17 21:01 Mazecreator

@Mazecreator & Others,

Indeed; when I set CUDA_VISIBLE_DEVICES=0, the deviceQuery returns successfully. However, now TensorFlow complains again with "Dst Tensor Not initialized" !!

This is so frustrating!!

It appears that CUDA is leaking memory... I see that free memory listed (when a python script starts) keeps getting less and less... though I dont know for sure if that's the problem. The workaround suggested above (set TF's GPUOptions) are all workarounds - they require manual code / intervention in existing scripts that were supposed to work OK.

See here: deviceQuery

 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_HOME 
/usr/local/cuda
 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_VISIBLE_DEVICES

 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 750M"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147024896 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            926 MHz (0.93 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS
 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ 

 py35 ▶ ~ ▶ Developer ❯ CUDA ❯ cuda-smi ▶ master ▶ ❓ ▶ $ ▶ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 369.92 of 2047.6 MB (i.e. 18.1%) Free

Running a Python script with TensorFlow:

 py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ python imagenet_inference.py 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 305.92MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): 	Total Chunks: 1, Chunks in use: 0 97.01MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 144.00MiB was 128.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60500 of size 139520
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82800 of size 1228800
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700bae800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700baec00 of size 3538944
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0ec00 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0f200 of size 2654208
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197200 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197800 of size 1769472
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701347800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x701347c00 of size 101725184
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1536 totalling 3.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1769472 totalling 1.69MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2654208 totalling 2.53MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 8.91MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                   111063040
InUse:                     9337856
MaxInUse:                  9337856
NumAllocs:                      11
MaxAllocSize:              3538944

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********___________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 144.00MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "imagenet_inference.py", line 19, in <module>
    sess.run(init)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'Variable_10/initial_value', defined at:
  File "imagenet_inference.py", line 16, in <module>
    probs = AlexNet(x, feature_extract=False)
  File "/Users/aa/Developer/courses/self_driving_carnd/traffic-signs/CarND-Alexnet-Feature-Extraction/alexnet.py", line 139, in AlexNet
    fc6W = tf.Variable(net_data["fc6"][0])
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
    expected_shape=expected_shape)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 333, in _init_from_args
    initial_value, name="initial_value", dtype=dtype)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

 py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ 

laventura avatar Jan 26 '17 18:01 laventura

I get the same error and I have 12GB of GPU memory:

mona@pascal:~/computer_vision/VPilot$ python train.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1938: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
  warnings.warn('\n'.join(msg))
Epoch 1/1000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 412.50MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4547d60
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 11.92GiB
Free memory: 534.50MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 512.0KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741400 of size 4096
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742600 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742e00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743600 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743e00 of size 222806528
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 256 totalling 5.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1024 totalling 1.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 2048 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4096 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 222806528 totalling 212.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 212.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                   222822400
InUse:                   222822400
MaxInUse:                222822400
NumAllocs:                      27
MaxAllocSize:            222806528
 
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***********************************************************************************************xxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 512.0KiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "train.py", line 55, in <module>
    callbacks=[ckp_callback]
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1553, in fit_generator
    class_weight=class_weight)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1316, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1919, in __call__
    session = get_session()
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 121, in get_session
    _initialize_variables()
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 275, in _initialize_variables
    sess.run(tf.initialize_variables(uninitialized_variables))
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
 
Caused by op u'Const_37', defined at:
  File "train.py", line 55, in <module>
    callbacks=[ckp_callback]
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1450, in fit_generator
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 761, in _make_train_function
    self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 234, in get_updates
    accumulators = [K.zeros(shape) for shape in shapes]
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 482, in zeros
    return variable(tf.constant_initializer(0., dtype=tf_dtype)(shape),
  File "/home/mona/tensorflow/_python_build/tensorflow/python/ops/init_ops.py", line 145, in _initializer
    return constant_op.constant(value, dtype=dtype, shape=shape)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/constant_op.py", line 167, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 2388, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 1300, in __init__
    self._traceback = _extract_stack()
 
InternalError (see above for traceback): Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

monajalal avatar Jan 26 '17 23:01 monajalal

@monajalal --

It appears that the GPU is running out of memory for some reason. WHY that is happening, I can't say; it is the most confounding thing since the executed programs have ended.

Probably a memory leak?? If so, it could be at the GPU driver level??

See here too: https://github.com/tensorflow/tensorflow/issues/7025#issuecomment-275475281

I've tried searching for how to release/clear GPU memory, but haven't found anything good / credible / useful.

Do let me know if you or anyone comes across a solution.

Until then, this TensorFlow + GPU combo is a total fail for me (on my Macbook). 😡

laventura avatar Jan 26 '17 23:01 laventura

MacBook Nvidia GPU isn't dedicated and shares resources with TensorFlow and the screen.

I regularly have out of memory issues. Using mid 2012 rMBP with GeForce 650.

Before running TensorFlow, I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card. Doing this releases some memory and I can execute TensofFlow scripts. Not all memory is cleared when I check memory with cuda-smi. Can quickly see which graphics card is being used with gfx.io app. I found it good to disable WebGL in safari (although it's needed for Tensorboard). Restarting Safari and pycharm before running TensorFlow scripts is helpful to clear GPU memory. Stop non-essential apps in the background is also helpful.

https://github.com/phvu/cuda-smi

https://gfx.io

An OSX issue is possibility?

MacBook isn't the best "all in one" dev platform for TensorFlow, it can be made to work... albeit frustratingly.

Would be good to force OSX to use integrated video chip for screen and Nvidia for dedicated TensorFlow. I'm totally unsure, however some early discussions about the hardware were indicating that apple has locked down certain parts of the GPU access... so if it can't be used exclusively now... it's likely to be difficult/impossible to do in the future.

normanheckscher avatar Jan 27 '17 01:01 normanheckscher

@normanheckscher - Thanks for the tips. Good to know about the Macbook GPU.

I downloaded gfx.io -it's helpful in understanding when the GPU is being used. I've used cuda-smi; it's useful in showing the free GPU mem, but doesn't really show the processes using it. I was hoping an nvidia-smi kind of thing would exist for Macs.

When you said "I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card" which 'resource monitor video card' column do you refer to? In ActivityMonitor? If so, I didn't find it. :-(

Yeah, I try closing most of the programs that use GPUs (mostly Chrome etc. that I use) before running TF scripts. Sometimes, the TF scripts run out of mem almost immediately after a fresh reboot, which is kind of confounding.

I'm just coming to a slow realization that TensorFlow + GPU combo isn't a very effective/efficient on Macbooks. 😕

I'm rather sadly investigating Theano combo (instead of TF) on top of Keras, which is my main high-level framework of choice. Sadly bcos I dont know enough Theano and dont have enough bandwidth to learn it effectively. :-/

laventura avatar Jan 27 '17 18:01 laventura

Sorry @laventura I meant the Activity Monitor for OSX. If you go to CPU or Memory tabs where you can look for the running processes you can select "View>Columns>Graphics Card" and a new column with "Requires High Perf GPU will appear. Sort by this column and you can see which processes are using the Nvidia card.

MacBookPro can be used for learning and development. I want to use TensorFlow and I find OSX is a very good environment to work in, so I deal with these little irritations while I get myself up to speed with TensorFlow. When my models need more memory I'll make the call as to building a headless Linux box or going with a service such as AWS. If I was starting from scratch I'd consider a dedicated GPU notebook that could run Linux, however, I'm not flush with cash and I don't see the need to purchase a new hardware environment when the one I have works.

Best of luck to you.

normanheckscher avatar Jan 28 '17 05:01 normanheckscher

This is not just a MacBook issue. I am seeing this on my laptop with a GTX1060 (6GB) Running ubuntu.

This seems to help: https://github.com/fchollet/keras/issues/3675

Use:

max_q_size=1,
pickle_safe=False

in fit_generator()

After adding these two options I am up and running again.

bpanahij avatar Jan 28 '17 23:01 bpanahij

I can run MacBook Pro NVidia GPU, but only for minimal applications:

import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8) #0.333
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))

When I increase the number of Conv2D filters e.g. from 32 to 64 I am starting to get DEAD KERNEL, so I lower number of images I process per batch from e.g. 256 to 24.

You have to keep trying until you get the right balance between the depth of your neural network, batch size and amount of GPU memory.

In the end, it is much faster than CPU, but too fragile, after much frustration, I am going back to CPU and more powerful Linux GPU instance.

UkiDLucas avatar Feb 16 '17 21:02 UkiDLucas

Stumbled onto this thread, perhaps my two cents can help. Launching python with a preceding flag of THEANO_FLAGS='device=gpu0' or THEANO_FLAGS='device=gpu1' etc (latter if you have more than 1 gpu) helps. For ex, this terminal command will run the python code on gpu2 (you can use gpustat to track usage of different GPUs on your machine in realtime:

THEANO_FLAGS='device=gpu4' python /run/this/script.py If your convolutional filters are large, having smaller training batches can be one way to overcome the memory issue. That is, if the network initialization fits in memory first.

jasgrewal avatar Jun 13 '17 00:06 jasgrewal

You don't have ENOUGH GPU MEMORY.

philipperemy avatar Jul 14 '17 05:07 philipperemy

Reduce the size of batches sent in the run or eval, it should do the trick.

SlobodanNinkov avatar Jul 15 '17 15:07 SlobodanNinkov

Running into the same issue with the smallest possible model, Cart-Pole, on GTX1080 8MB. is it a TensorFlow bug that can be fixed somehow or we are simply trying to fit too big models (overenthusiastic the batch size probably the main reason for that)?

sebtac

sebtac avatar Mar 15 '18 15:03 sebtac

when the system is idle and not processing, shouldn't somehow python not use the whole GPU memory ? it is a useful feature

estathop avatar Oct 10 '18 10:10 estathop

In my case I have a laptop, the command export CUDA_VISIBLE_DEVICES=1 made the training really slow so I assume it used the integrated graphics card. So I had to use value 0.

soufianesabiri avatar Mar 14 '19 14:03 soufianesabiri

I'm having the same issue. Its a windows machine. Now I reduced my rnn size and embedding size.. lets see.

sunn-e avatar May 29 '19 18:05 sunn-e

Not working.

sunn-e avatar May 30 '19 04:05 sunn-e

For my case, it was the issue with the dataset. Removing the problematic images (Google large images or weird images that you get from web scrapping) solved my problem.

dhruvchamania avatar Jul 12 '19 05:07 dhruvchamania

Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))

However when I tried to run training it crashed the jupyter notebook.

There it is! Thank you for the answer. It worked in my case! There is also another similar solution:

config = tf.ConfigProto(gpu_options= tf.GPUOptions(allow_growth=True))
# allow_growth=True is the important part here

iedmrc avatar Aug 13 '19 14:08 iedmrc

It occurs due to full of the memory of GPU. The best way is to reduce batch size

Like if batch_size = 32

make it 16/8/4/2 anything till your error is resolved

It works every single time for me.

Boltuzamaki avatar Jan 13 '20 23:01 Boltuzamaki

WARNING:tensorflow:sample_weight modes were coerced from ... to
['...'] WARNING:tensorflow:sample_weight modes were coerced from ... to
['...'] WARNING:tensorflow:sample_weight modes were coerced from ... to
['...'] WARNING:tensorflow:sample_weight modes were coerced from ... to
['...'] Train for 11523 steps, validate for 4153 steps Epoch 1/5 1/11523 [..............................] - ETA: 33:47:16

InternalError Traceback (most recent call last) in 6 epochs=EPOCHS, 7 validation_data=validation_generator, ----> 8 validation_steps=validation_generator.samples//validation_generator.batch_size)

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(*args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated',

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1304 use_multiprocessing=use_multiprocessing, 1305 shuffle=shuffle, -> 1306 initial_epoch=initial_epoch) 1307 1308 @deprecation.deprecated(

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 817 max_queue_size=max_queue_size, 818 workers=workers, --> 819 use_multiprocessing=use_multiprocessing) 820 821 def evaluate(self,

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 340 mode=ModeKeys.TRAIN, 341 training_context=training_context, --> 342 total_epochs=epochs) 343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN) 344

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs) 126 step=step, mode=mode, size=current_batch_size) as batch_logs: 127 try: --> 128 batch_outs = execution_function(iterator) 129 except (StopIteration, errors.OutOfRangeError): 130 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in execution_function(input_fn) 96 # numpy translates Tensors to values in Eager mode. 97 return nest.map_structure(_non_none_constant_value, ---> 98 distributed_function(input_fn)) 99 100 return execution_function

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in call(self, *args, **kwds) 566 xla_context.Exit() 567 else: --> 568 result = self._call(*args, **kwds) 569 570 if tracing_count == self._get_tracing_count():

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds) 597 # In this case we have created variables on the first call, so we run the 598 # defunned version which is guaranteed to never create variables. --> 599 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable 600 elif self._stateful_fn is not None: 601 # Release the lock early so that multiple threads can perform the call

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, *args, **kwargs) 2361 with self._lock: 2362 graph_function, args, kwargs = self._maybe_define_function(args, kwargs) -> 2363 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access 2364 2365 @property

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs) 1609 if isinstance(t, (ops.Tensor, 1610 resource_variable_ops.BaseResourceVariable))), -> 1611 self.captured_inputs) 1612 1613 def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1690 # No tape is watching; skip to running the function. 1691 return self._build_call_outputs(self._inference_function.call( -> 1692 ctx, args, cancellation_manager=cancellation_manager)) 1693 forward_backward = self._select_forward_and_backward_functions( 1694 args,

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager) 543 inputs=args, 544 attrs=("executor_type", executor_type, "config_proto", config), --> 545 ctx=ctx) 546 else: 547 outputs = execute.execute_with_cancellation(

~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 65 else: 66 message = e.message ---> 67 six.raise_from(core._status_to_exception(e.code, message), None) 68 except TypeError as e: 69 keras_symbolic_tensors = [

~\anaconda3\envs\ev_2\lib\site-packages\six.py in raise_from(value, from_value)

InternalError: Dst tensor is not initialized. [[{{node IteratorGetNext/_2}}]] [Op:__inference_distributed_function_24557]

Function call stack: distributed_function please how do i resolve in windows Os

Adesoji1 avatar Jan 24 '21 08:01 Adesoji1

It occurs due to full of the memory of GPU. The best way is to reduce batch size

Like if batch_size = 32

make it 16/8/4/2 anything till your error is resolved

It works every single time for me.

For me, removing val_split helped as well. :shrug:

MrYakobo avatar Feb 05 '21 08:02 MrYakobo

I had a similar problem when loading a previously trained model from disk (so changing the batch_size wasn't an option). This is what fixed it:

with tf.device('/CPU:0'):
    loaded = tf.saved_model.load(model_path)

bmy-ashampoo avatar Jan 29 '22 12:01 bmy-ashampoo