TensorFlow-Examples
TensorFlow-Examples copied to clipboard
Dst tensor is not initialized
Hi @aymericdamien I have run this script logistic regression.py But i met a problem "Dst tensor is not initialized". The detail log is here:
Epoch: 0001 cost= 29.917553501
Epoch: 0002 cost= 21.929896693
Epoch: 0003 cost= 21.063875407
Epoch: 0004 cost= 20.457020144
Epoch: 0005 cost= 20.084428289
Epoch: 0006 cost= 19.814794980
Epoch: 0007 cost= 19.674670629
Epoch: 0008 cost= 19.510438999
Epoch: 0009 cost= 19.309689613
Epoch: 0010 cost= 19.223995275
Epoch: 0011 cost= 19.161345129
Epoch: 0012 cost= 18.985856709
Epoch: 0013 cost= 18.917688493
Epoch: 0014 cost= 18.832972273
Epoch: 0015 cost= 18.742634454
Epoch: 0016 cost= 18.695894625
Epoch: 0017 cost= 18.643278683
Epoch: 0018 cost= 18.609112186
Epoch: 0019 cost= 18.444614899
Epoch: 0020 cost= 18.532375607
Epoch: 0021 cost= 18.437554449
Epoch: 0022 cost= 18.310914770
Epoch: 0023 cost= 18.289282742
Epoch: 0024 cost= 18.214274961
Epoch: 0025 cost= 18.293197173
Optimization Finished!
Accuracy:
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
<ipython-input-17-f661f1e1e9de> in <module>()
24 # Calculate accuracy
25 accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
---> 26 print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in eval(self, feed_dict, session)
500
501 """
--> 502 return _eval_using_default_session(self, feed_dict, self.graph, session)
503
504
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in _eval_using_default_session(tensors, feed_dict, graph, session)
3332 "the tensor's graph is different from the session's "
3333 "graph.")
-> 3334 return session.run(tensors, feed_dict)
3335
3336
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
338 try:
339 result = self._run(None, fetches, feed_dict, options_ptr,
--> 340 run_metadata_ptr)
341 if run_metadata:
342 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
562 try:
563 results = self._do_run(handle, target_list, unique_fetches,
--> 564 feed_dict_string, options, run_metadata)
565 finally:
566 # The movers are no longer used. Delete them.
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
635 if handle is None:
636 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637 target_list, options, run_metadata)
638 else:
639 return self._do_call(_prun_fn, self._session, handle, feed_dict,
/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
657 # pylint: disable=protected-access
658 raise errors._make_specific_exception(node_def, op, error_message,
--> 659 e.code)
660 # pylint: enable=protected-access
661
InternalError: Dst tensor is not initialized.
[[Node: _recv_Placeholder_1_0/_27513 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_267__recv_Placeholder_1_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: Mean_6/_27517 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_277_Mean_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Are you using GPU? Usually, this error raises when GPU memory is full.
@aymericdamien Thanks! I found the reason : I use ipython notebook to run the code , but i forget to close another one , the script and it waster too much memory
Yup, GPU Memory Full is the reason. IPython kernels stuck in background processes does that.
Thanks, Subodh thesubodh.com
@burness @subodhp I'm getting the same error ("Ran out of memory")
[MacbookPro 2013 with 16 GM RAM, GPU (2GB RAM), TensorFlow 0.11, CUDA 8.0, CUDNN 5.x]
I tried shutting down the Jupyter Notebook and restarting it... but it crashed with the same error. Is this solved? How does one resolve GPU memory full errors?
Thanks!
`I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 256 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 31488 totalling 30.8KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 46609152 totalling 44.45MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 44.48MiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 57622528 InUse: 46643200 MaxInUse: 46643200 NumAllocs: 11 MaxAllocSize: 46609152
W tensorflow/core/common_runtime/bfc_allocator.cc:270] ********************************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 390.6KiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: Reshape_1/_2__cf__2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,10] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]] `
I rebooted my Macbook and started afresh. system: [MacbookPro 2013, with 16 GB RAM, GPU with 2GB RAM; Tensor Flow 0.11, CUDA 8.0, CUDNN 5.x] Here's the error I get (see attached error-tf.txt at the bottom for all detail).
- How is the free memory only 20.49 MiB (on a recently rebooted system) if there's 2.0 GiB available to the GPU?
- Is there a way to track GPU memory usage?
- Is there a way to disable GPU usage for an iPython notebook?
Thanks!
Some relevant parts I see:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: name: GeForce GT 750M major: 3 minor: 0 memoryClockRate (GHz) 0.9255 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 20.49MiB ...
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 21487616 InUse: 33792 MaxInUse: 65280 NumAllocs: 9 MaxAllocSize: 31488
W
tensorflow/core/common_runtime/bfc_allocator.cc:270] *___________________________________________________________________________________________________ W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 29.91MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: Const = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,784] values: -0.5 -0.49607843 -0.5...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
laventura, did you ever find a solution to the gpu out of memory error? I have the same problem with the same setup. Though I got an error trying to allocate 10.8Mib
@pumplerod - I found a solution / kludge that somehow seems to work, although I can't explain why / how.
Before starting your Jupyter notebook / tensorflow program, set this:
export CUDA_VISIBLE_DEVICES=1
This seems to work in that the scripts work OK. Not sure if this is a requirement. Give it a try and see.
Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
However when I tried to run training it crashed the jupyter notebook.
@pumplerod
Oh yours is very helpful for me. I got an error message about only 29Mib out of memory. I added your code with fraction 0.8 since there was 80% free memory (from 2GiB, 1.6GiB was free). My code started working. After that, I deleted ALL GPU options and this still works. very curious..
Update on this:
Earlier -- the GPU was being recognized by an older TensorFlow. Now, when I upgraded TF to 0.11rc2 and later to 0.12
Now, my TF does not recognize any GPU at all.
Also, the deviceQuery does not report any GPU either. I'm going totally bonkers in this CUDA hell.
See details here: https://github.com/tensorflow/tensorflow/issues/2882
Also on NVIDIA Devtalk, if any one has any bright insights - would be very helpful to me! https://devtalk.nvidia.com/default/topic/990015/cuda-setup-and-installation/help-cuda-7-5-or-8-devicequery-failing-not-working-on-macbookpro-2013-os-x-10-11-gt750m/
Just stumbled upon this thread. I think you have hidden your GPU from the CUDA drivers with this line:
export CUDA_VISIBLE_DEVICES=1
What this is telling CUDA is that it should only use "Device 1" in your system. So, unless you have 2 GPU devices, you have hidden the primary "Device 0". I am sure if you set this as follows TF will see your GPU again, but your other problems may return:
export CUDA_VISIBLE_DEVICES=0
@Mazecreator & Others,
Indeed; when I set CUDA_VISIBLE_DEVICES=0
, the deviceQuery
returns successfully. However, now TensorFlow complains again with "Dst Tensor Not initialized"
!!
This is so frustrating!!
It appears that CUDA is leaking memory... I see that free memory listed (when a python script starts) keeps getting less and less... though I dont know for sure if that's the problem. The workaround suggested above (set TF's GPUOptions) are all workarounds - they require manual code / intervention in existing scripts that were supposed to work OK.
See here: deviceQuery
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_HOME
/usr/local/cuda
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_VISIBLE_DEVICES
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 750M"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 926 MHz (0.93 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶
py35 ▶ ~ ▶ Developer ❯ CUDA ❯ cuda-smi ▶ master ▶ ❓ ▶ $ ▶ ./cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 369.92 of 2047.6 MB (i.e. 18.1%) Free
Running a Python script with TensorFlow:
py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ python imagenet_inference.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 305.92MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 1, Chunks in use: 0 97.01MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 144.00MiB was 128.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60500 of size 139520
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82800 of size 1228800
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700bae800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700baec00 of size 3538944
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0ec00 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0f200 of size 2654208
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197200 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197800 of size 1769472
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701347800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x701347c00 of size 101725184
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1536 totalling 3.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1769472 totalling 1.69MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2654208 totalling 2.53MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 8.91MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 111063040
InUse: 9337856
MaxInUse: 9337856
NumAllocs: 11
MaxAllocSize: 3538944
W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********___________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 144.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
return fn(*args)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
status, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "imagenet_inference.py", line 19, in <module>
sess.run(init)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op 'Variable_10/initial_value', defined at:
File "imagenet_inference.py", line 16, in <module>
probs = AlexNet(x, feature_extract=False)
File "/Users/aa/Developer/courses/self_driving_carnd/traffic-signs/CarND-Alexnet-Feature-Extraction/alexnet.py", line 139, in AlexNet
fc6W = tf.Variable(net_data["fc6"][0])
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 333, in _init_from_args
initial_value, name="initial_value", dtype=dtype)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶
I get the same error and I have 12GB of GPU memory:
mona@pascal:~/computer_vision/VPilot$ python train.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1938: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
warnings.warn('\n'.join(msg))
Epoch 1/1000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 412.50MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4547d60
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 11.92GiB
Free memory: 534.50MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 512.0KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741400 of size 4096
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742600 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742e00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743600 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743e00 of size 222806528
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 256 totalling 5.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1024 totalling 1.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 2048 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4096 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 222806528 totalling 212.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 212.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 222822400
InUse: 222822400
MaxInUse: 222822400
NumAllocs: 27
MaxAllocSize: 222806528
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***********************************************************************************************xxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 512.0KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "train.py", line 55, in <module>
callbacks=[ckp_callback]
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1553, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1316, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1919, in __call__
session = get_session()
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 121, in get_session
_initialize_variables()
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 275, in _initialize_variables
sess.run(tf.initialize_variables(uninitialized_variables))
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 717, in run
run_metadata_ptr)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 915, in _run
feed_dict_string, options, run_metadata)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 965, in _do_run
target_list, options, run_metadata)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 985, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'Const_37', defined at:
File "train.py", line 55, in <module>
callbacks=[ckp_callback]
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1450, in fit_generator
self._make_train_function()
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 761, in _make_train_function
self.total_loss)
File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 234, in get_updates
accumulators = [K.zeros(shape) for shape in shapes]
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 482, in zeros
return variable(tf.constant_initializer(0., dtype=tf_dtype)(shape),
File "/home/mona/tensorflow/_python_build/tensorflow/python/ops/init_ops.py", line 145, in _initializer
return constant_op.constant(value, dtype=dtype, shape=shape)
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/constant_op.py", line 167, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 2388, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 1300, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
@monajalal --
It appears that the GPU is running out of memory for some reason. WHY that is happening, I can't say; it is the most confounding thing since the executed programs have ended.
Probably a memory leak?? If so, it could be at the GPU driver level??
See here too: https://github.com/tensorflow/tensorflow/issues/7025#issuecomment-275475281
I've tried searching for how to release/clear GPU memory, but haven't found anything good / credible / useful.
Do let me know if you or anyone comes across a solution.
Until then, this TensorFlow + GPU combo is a total fail for me (on my Macbook). 😡
MacBook Nvidia GPU isn't dedicated and shares resources with TensorFlow and the screen.
I regularly have out of memory issues. Using mid 2012 rMBP with GeForce 650.
Before running TensorFlow, I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card. Doing this releases some memory and I can execute TensofFlow scripts. Not all memory is cleared when I check memory with cuda-smi. Can quickly see which graphics card is being used with gfx.io app. I found it good to disable WebGL in safari (although it's needed for Tensorboard). Restarting Safari and pycharm before running TensorFlow scripts is helpful to clear GPU memory. Stop non-essential apps in the background is also helpful.
https://github.com/phvu/cuda-smi
https://gfx.io
An OSX issue is possibility?
MacBook isn't the best "all in one" dev platform for TensorFlow, it can be made to work... albeit frustratingly.
Would be good to force OSX to use integrated video chip for screen and Nvidia for dedicated TensorFlow. I'm totally unsure, however some early discussions about the hardware were indicating that apple has locked down certain parts of the GPU access... so if it can't be used exclusively now... it's likely to be difficult/impossible to do in the future.
@normanheckscher - Thanks for the tips. Good to know about the Macbook GPU.
I downloaded gfx.io -it's helpful in understanding when the GPU is being used.
I've used cuda-smi
; it's useful in showing the free GPU mem, but doesn't really show the processes using it. I was hoping an nvidia-smi
kind of thing would exist for Macs.
When you said "I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card" which 'resource monitor video card' column do you refer to? In ActivityMonitor? If so, I didn't find it. :-(
Yeah, I try closing most of the programs that use GPUs (mostly Chrome etc. that I use) before running TF scripts. Sometimes, the TF scripts run out of mem almost immediately after a fresh reboot, which is kind of confounding.
I'm just coming to a slow realization that TensorFlow + GPU combo isn't a very effective/efficient on Macbooks. 😕
I'm rather sadly investigating Theano combo (instead of TF) on top of Keras, which is my main high-level framework of choice. Sadly bcos I dont know enough Theano and dont have enough bandwidth to learn it effectively. :-/
Sorry @laventura I meant the Activity Monitor for OSX. If you go to CPU or Memory tabs where you can look for the running processes you can select "View>Columns>Graphics Card" and a new column with "Requires High Perf GPU will appear. Sort by this column and you can see which processes are using the Nvidia card.
MacBookPro can be used for learning and development. I want to use TensorFlow and I find OSX is a very good environment to work in, so I deal with these little irritations while I get myself up to speed with TensorFlow. When my models need more memory I'll make the call as to building a headless Linux box or going with a service such as AWS. If I was starting from scratch I'd consider a dedicated GPU notebook that could run Linux, however, I'm not flush with cash and I don't see the need to purchase a new hardware environment when the one I have works.
Best of luck to you.
This is not just a MacBook issue. I am seeing this on my laptop with a GTX1060 (6GB) Running ubuntu.
This seems to help: https://github.com/fchollet/keras/issues/3675
Use:
max_q_size=1,
pickle_safe=False
in fit_generator()
After adding these two options I am up and running again.
I can run MacBook Pro NVidia GPU, but only for minimal applications:
import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8) #0.333
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
When I increase the number of Conv2D filters e.g. from 32 to 64 I am starting to get DEAD KERNEL, so I lower number of images I process per batch from e.g. 256 to 24.
You have to keep trying until you get the right balance between the depth of your neural network, batch size and amount of GPU memory.
In the end, it is much faster than CPU, but too fragile, after much frustration, I am going back to CPU and more powerful Linux GPU instance.
Stumbled onto this thread, perhaps my two cents can help. Launching python with a preceding flag of THEANO_FLAGS='device=gpu0' or THEANO_FLAGS='device=gpu1' etc (latter if you have more than 1 gpu) helps. For ex, this terminal command will run the python code on gpu2 (you can use gpustat to track usage of different GPUs on your machine in realtime:
THEANO_FLAGS='device=gpu4' python /run/this/script.py If your convolutional filters are large, having smaller training batches can be one way to overcome the memory issue. That is, if the network initialization fits in memory first.
You don't have ENOUGH GPU MEMORY.
Reduce the size of batches sent in the run or eval, it should do the trick.
Running into the same issue with the smallest possible model, Cart-Pole, on GTX1080 8MB. is it a TensorFlow bug that can be fixed somehow or we are simply trying to fit too big models (overenthusiastic the batch size probably the main reason for that)?
sebtac
when the system is idle and not processing, shouldn't somehow python not use the whole GPU memory ? it is a useful feature
In my case I have a laptop, the command export CUDA_VISIBLE_DEVICES=1
made the training really slow so I assume it used the integrated graphics card. So I had to use value 0.
I'm having the same issue. Its a windows machine. Now I reduced my rnn size and embedding size.. lets see.
Not working.
For my case, it was the issue with the dataset. Removing the problematic images (Google large images or weird images that you get from web scrapping) solved my problem.
Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
However when I tried to run training it crashed the jupyter notebook.
There it is! Thank you for the answer. It worked in my case! There is also another similar solution:
config = tf.ConfigProto(gpu_options= tf.GPUOptions(allow_growth=True))
# allow_growth=True is the important part here
It occurs due to full of the memory of GPU. The best way is to reduce batch size
Like if batch_size = 32
make it 16/8/4/2 anything till your error is resolved
It works every single time for me.
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
Train for 11523 steps, validate for 4153 steps
Epoch 1/5
1/11523 [..............................] - ETA: 33:47:16
InternalError Traceback (most recent call last)
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(*args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated',
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1304 use_multiprocessing=use_multiprocessing, 1305 shuffle=shuffle, -> 1306 initial_epoch=initial_epoch) 1307 1308 @deprecation.deprecated(
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 817 max_queue_size=max_queue_size, 818 workers=workers, --> 819 use_multiprocessing=use_multiprocessing) 820 821 def evaluate(self,
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 340 mode=ModeKeys.TRAIN, 341 training_context=training_context, --> 342 total_epochs=epochs) 343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN) 344
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs) 126 step=step, mode=mode, size=current_batch_size) as batch_logs: 127 try: --> 128 batch_outs = execution_function(iterator) 129 except (StopIteration, errors.OutOfRangeError): 130 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in execution_function(input_fn)
96 # numpy
translates Tensors to values in Eager mode.
97 return nest.map_structure(_non_none_constant_value,
---> 98 distributed_function(input_fn))
99
100 return execution_function
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in call(self, *args, **kwds) 566 xla_context.Exit() 567 else: --> 568 result = self._call(*args, **kwds) 569 570 if tracing_count == self._get_tracing_count():
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds) 597 # In this case we have created variables on the first call, so we run the 598 # defunned version which is guaranteed to never create variables. --> 599 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable 600 elif self._stateful_fn is not None: 601 # Release the lock early so that multiple threads can perform the call
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, *args, **kwargs) 2361 with self._lock: 2362 graph_function, args, kwargs = self._maybe_define_function(args, kwargs) -> 2363 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access 2364 2365 @property
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs) 1609 if isinstance(t, (ops.Tensor, 1610 resource_variable_ops.BaseResourceVariable))), -> 1611 self.captured_inputs) 1612 1613 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1690 # No tape is watching; skip to running the function. 1691 return self._build_call_outputs(self._inference_function.call( -> 1692 ctx, args, cancellation_manager=cancellation_manager)) 1693 forward_backward = self._select_forward_and_backward_functions( 1694 args,
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager) 543 inputs=args, 544 attrs=("executor_type", executor_type, "config_proto", config), --> 545 ctx=ctx) 546 else: 547 outputs = execute.execute_with_cancellation(
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 65 else: 66 message = e.message ---> 67 six.raise_from(core._status_to_exception(e.code, message), None) 68 except TypeError as e: 69 keras_symbolic_tensors = [
~\anaconda3\envs\ev_2\lib\site-packages\six.py in raise_from(value, from_value)
InternalError: Dst tensor is not initialized. [[{{node IteratorGetNext/_2}}]] [Op:__inference_distributed_function_24557]
Function call stack: distributed_function please how do i resolve in windows Os
It occurs due to full of the memory of GPU. The best way is to reduce batch size
Like if batch_size = 32
make it 16/8/4/2 anything till your error is resolved
It works every single time for me.
For me, removing val_split
helped as well. :shrug:
I had a similar problem when loading a previously trained model from disk (so changing the batch_size wasn't an option). This is what fixed it:
with tf.device('/CPU:0'):
loaded = tf.saved_model.load(model_path)