mtcnn
mtcnn copied to clipboard
Cudnn PoolForward launch failed exception
Hello, thanks for your work on package. We periodically get such exceptions with cudnn. Any hints what can cause such problem? Env: tensorflow-gpu==1.14 cuda 10.1 cudnn 7.6.5.32 mtcnn==0.0.9
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cudnn PoolForward launch failed
[[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
(1) Internal: cudnn PoolForward launch failed
[[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
[[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
11:39
"caught error while running engine ops
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cudnn PoolForward launch failed
[[{{node rnet/pool1}}]]
(1) Internal: cudnn PoolForward launch failed
[[{{node rnet/pool1}}]]
[[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/leapi/worker/pipeline_item_celery.py", line 118, in run_engine_proc
out_payload = engine.run_ops(task.pipeline.operations, payload)
File "/app/leapi/pipeline/engine.py", line 80, in run_ops
payload = self.run_op(op, payload, warmup)
File "/app/leapi/pipeline/engine.py", line 87, in run_op
payload_out = f(payload, **op._kwargs)
File "/app/leapi/util/timeit.py", line 10, in timed
result = f(*args, **kwargs)
File "/app/leapi/pipeline/neural.py", line 949, in resize_upscale_with_faces
p = self.detect_and_extract_faces(p, face_method=face_method)
File "/app/leapi/util/timeit.py", line 10, in timed
result = f(*args, **kwargs)
File "/app/leapi/pipeline/neural.py", line 559, in detect_and_extract_faces
faces_json = self.mtcnn_detector.detect_faces(win)
File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 418, in detect_faces
result = stage(img, result[0], result[1])
File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 528, in __stage2
out = self.__rnet.feed(tempimg1)
File "/usr/local/lib/python3.6/dist-packages/mtcnn/network.py", line 108, in feed
return self._feed(image)
File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 103, in _feed
return self._session.run(['rnet/fc2-2/fc2-2:0', 'rnet/prob1:0'], feed_dict={'rnet/input:0': image})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cudnn PoolForward launch failed
[[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
(1) Internal: cudnn PoolForward launch failed
[[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
[[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
I also got "tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed", did you end up fixing this issue? I am using tensorflow-gpu 1.12, cuda 9.0, cudnn 7.6.5.
To be honest, i did not remember :) i guess i ended up playing with versions and env config.