FaceDetector
FaceDetector copied to clipboard
CUDA error: an illegal memory access was encountered
When I try to generate training data for onet I receive a memory error in the middle of processing the image files. I am using pytorch 1.5. Any help would be appreciated.
What's your gpu memory size ?
hey.. while generating data for pnet.. did you encountered module not found error ?
like while running python scripts/gen_pnet_train.py, I got
Traceback (most recent call last):
File "scripts/gen_pnet_train.py", line 4, in
Can you help me with this ?
@SURABHI-GUPTA
Have a try. in inscripts/gen_pnet_train.py file. You can add following code
import sys
sys.path.insert(0,'/path_of_folder/FaceDetector/')
but I am getting this error after that:
ModuleNotFoundError: No module named 'mtcnn.utils.nms.cpu_nms'
@SURABHI-GUPTA
Have you compiled the Cython code?
python setup.py build_ext --inplace
yes, I compiled but there was some error.. it has been resolved. thanks @ansonku
after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required
although dets.shape[0] is returning an integer
What's your gpu memory size ?
I have 16GB of memory on the GPU so I don't think I should see this error. That being said, I do see part of the code that is meant to deal with memory errors:
https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/mtcnn/train/gen_onet_train.py#L83-L88
I assume this part is running on the CPU because it is very slow on my machine.
after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required
although dets.shape[0] is returning an integer
Try changing cuda to cuda:0 here
https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33
I had the same issue and this change fixed it for me.
after training pnet.. I want to generate samples for rnet, got this error, File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms cdef int boxes_num = dets.shape[0] TypeError: an integer is required although dets.shape[0] is returning an integer
Try changing cuda to cuda:0 here
https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33
I had the same issue and this change fixed it for me.
cool.. thanks @captureguy
btw for how many epochs have you trained pnet and rnet ?
@captureguy issue is solved. have you loaded cuda and cudnn modules properly ?
@captureguy Did you end up fixing your RuntimeError: CUDA error: an illegal memory access was encountered
Issue? I'm getting the same issue and I also have a 16GB GPU
I think I have solved this problem "RuntimeError: CUDA error: an illegal memory access was encountered" I don't know the reason why the CUDA memory cache can't release, and I added this code in "mtcnn/train/gen_onet_train.py" solved the problem.
for index, item in enumerate(meta_data): bar.update(index) torch.cuda.empty_cache()
and changed this to monitor:
try: processed_img = detector._preprocess(img) candidate_boxes = detector.stage_one(processed_img, 0.5, 0.707, 12, 0.7) except RuntimeError: print("Out of memory on process img '%s.'" % file_name) continue try: candidate_boxes = detector.stage_two(processed_img, candidate_boxes, 0.5, 0.7) except RuntimeError: print("Out of memory on process img '%s.'" % file_name) continue