Pytorch-UNet
Pytorch-UNet copied to clipboard
train Detected OutOfMemoryError
ERROR: Detected OutOfMemoryError! Enabling checkpointing to reduce memory usage, but this slows down training. Consider enabling AMP (--amp) for fast and memory efficient training
win10 python 3.7 torch-gpu 1.12 cuda 11.3 gtx 1080TI
You are running out of memory, please reduce the scaling of images, use a larger GPU or enable AMP as the message suggests.
Thanks for answering. I don't think it is a problem of memory.Because I just use 30 640*640-pixel images. And i have run it successfully before with your code. is it another possibility here?
Well the error is Detected OutOfMemoryError! so it is definitely a memory error. Check that nothing else is running on your GPU like UI or other workloads.
train got error:
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/media/lee/common/PycahrmProjects/Pytorch-UNet-master/train.py", line 43, in train_model dataset = CarvanaDataset(dir_img, dir_mask, img_scale) File "/media/lee/common/PycahrmProjects/Pytorch-UNet-master/utils/data_loading.py", line 118, in init super().init(images_dir, mask_dir, scale, mask_suffix='_mask') File "/media/lee/common/PycahrmProjects/Pytorch-UNet-master/utils/data_loading.py", line 54, in init unique = list(tqdm( File "/home/lee/anaconda3/envs/persondet/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/home/lee/anaconda3/envs/persondet/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/lee/common/PycahrmProjects/Pytorch-UNet-master/train.py", line 212, in
@willianLee I think you need to check the mapping between label and image, which may be a problem of initial configuration, such as checking the number of classes or whether label production is normal
thanks for this. i got new problem, when i set model channel=3, it report error: AssertionError: Network has been defined with 3 input channels, but loaded images have 1 channels. Please check that the images are loaded correctly.
so i set model channel=1 and got error: AssertionError: Network has been defined with 1 input channels, but loaded images have 3 channels. Please check that the images are loaded correctly.
It's a bit strange !
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED