FaceDetector icon indicating copy to clipboard operation
FaceDetector copied to clipboard

CUDA error: an illegal memory access was encountered

Open captureguy opened this issue 4 years ago • 13 comments

When I try to generate training data for onet I receive a memory error in the middle of processing the image files. I am using pytorch 1.5. Any help would be appreciated.

Capture

captureguy avatar May 28 '20 14:05 captureguy

What's your gpu memory size ?

ansonku avatar May 29 '20 02:05 ansonku

hey.. while generating data for pnet.. did you encountered module not found error ? like while running python scripts/gen_pnet_train.py, I got Traceback (most recent call last): File "scripts/gen_pnet_train.py", line 4, in import mtcnn.train.gen_pnet_train as gptd ImportError: No module named 'mtcnn'

Can you help me with this ?

SURABHI-GUPTA avatar May 29 '20 06:05 SURABHI-GUPTA

@SURABHI-GUPTA

Have a try. in inscripts/gen_pnet_train.py file. You can add following code

import sys

sys.path.insert(0,'/path_of_folder/FaceDetector/')

ansonku avatar May 29 '20 06:05 ansonku

but I am getting this error after that:

ModuleNotFoundError: No module named 'mtcnn.utils.nms.cpu_nms'

SURABHI-GUPTA avatar May 29 '20 08:05 SURABHI-GUPTA

@SURABHI-GUPTA

Have you compiled the Cython code?

python setup.py build_ext --inplace

ansonku avatar May 29 '20 11:05 ansonku

yes, I compiled but there was some error.. it has been resolved. thanks @ansonku

SURABHI-GUPTA avatar May 29 '20 17:05 SURABHI-GUPTA

after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required

although dets.shape[0] is returning an integer

SURABHI-GUPTA avatar May 29 '20 17:05 SURABHI-GUPTA

What's your gpu memory size ?

I have 16GB of memory on the GPU so I don't think I should see this error. That being said, I do see part of the code that is meant to deal with memory errors:

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/mtcnn/train/gen_onet_train.py#L83-L88

I assume this part is running on the CPU because it is very slow on my machine.

captureguy avatar May 29 '20 18:05 captureguy

after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required

although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

captureguy avatar May 29 '20 18:05 captureguy

after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

cool.. thanks @captureguy

btw for how many epochs have you trained pnet and rnet ?

SURABHI-GUPTA avatar May 30 '20 02:05 SURABHI-GUPTA

@captureguy issue is solved. have you loaded cuda and cudnn modules properly ?

SURABHI-GUPTA avatar May 31 '20 03:05 SURABHI-GUPTA

@captureguy Did you end up fixing your RuntimeError: CUDA error: an illegal memory access was encountered

Issue? I'm getting the same issue and I also have a 16GB GPU

gerald-ftk avatar Aug 03 '20 22:08 gerald-ftk

I think I have solved this problem "RuntimeError: CUDA error: an illegal memory access was encountered" I don't know the reason why the CUDA memory cache can't release, and I added this code in "mtcnn/train/gen_onet_train.py" solved the problem.

for index, item in enumerate(meta_data):
    bar.update(index)
    torch.cuda.empty_cache()

and changed this to monitor:

    try:
        processed_img = detector._preprocess(img)
        candidate_boxes = detector.stage_one(processed_img, 0.5, 0.707, 12, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue
    try:
        candidate_boxes = detector.stage_two(processed_img, candidate_boxes, 0.5, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue

congduan-HNU avatar Sep 06 '21 08:09 congduan-HNU