mtcnn icon indicating copy to clipboard operation
mtcnn copied to clipboard

Cudnn PoolForward launch failed exception

Open hzlmn opened this issue 4 years ago • 2 comments

Hello, thanks for your work on package. We periodically get such exceptions with cudnn. Any hints what can cause such problem? Env: tensorflow-gpu==1.14 cuda 10.1 cudnn 7.6.5.32 mtcnn==0.0.9

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
11:39
"caught error while running engine ops
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
  (1) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/leapi/worker/pipeline_item_celery.py", line 118, in run_engine_proc
    out_payload = engine.run_ops(task.pipeline.operations, payload)
  File "/app/leapi/pipeline/engine.py", line 80, in run_ops
    payload = self.run_op(op, payload, warmup)
  File "/app/leapi/pipeline/engine.py", line 87, in run_op
    payload_out = f(payload, **op._kwargs)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 949, in resize_upscale_with_faces
    p = self.detect_and_extract_faces(p, face_method=face_method)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 559, in detect_and_extract_faces
    faces_json = self.mtcnn_detector.detect_faces(win)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 418, in detect_faces
    result = stage(img, result[0], result[1])
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 528, in __stage2
    out = self.__rnet.feed(tempimg1)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/network.py", line 108, in feed
    return self._feed(image)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 103, in _feed
    return self._session.run(['rnet/fc2-2/fc2-2:0', 'rnet/prob1:0'], feed_dict={'rnet/input:0': image})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.

hzlmn avatar Aug 10 '20 15:08 hzlmn

I also got "tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed", did you end up fixing this issue? I am using tensorflow-gpu 1.12, cuda 9.0, cudnn 7.6.5.

owlhtchen avatar Mar 03 '22 17:03 owlhtchen

To be honest, i did not remember :) i guess i ended up playing with versions and env config.

hzlmn avatar Mar 05 '22 19:03 hzlmn