image-segmentation-keras
image-segmentation-keras copied to clipboard
16g gpu memory still got run out of memory error
Hi, I tried to run the command like
python -m keras_segmentation train
--checkpoints_path="path_to_checkpoints"
--train_images="dataset1/images_prepped_train/"
--train_annotations="dataset1/annotations_prepped_train/"
--val_images="dataset1/images_prepped_test/"
--val_annotations="dataset1/annotations_prepped_test/"
--n_classes=50
--input_height=320
--input_width=640
--model_name="fcn_8_resnet50"
I tried several gpu(GTX 960M with 2G memory, Quardo P2000 with 4G memory, Tesla P100 with 16G memory), but all failed with error out of memory.
I also tried to reduce the image size and batch size, however it did not help.
Could anyone give me a hint about this issue?
I deleted tensorflow-gpu and reinstalled tensorflow so without gpu, everything works properly.
I think, maybe some gpu settings are not correct.......
I got this problem on another project. I tried the same code with tensorflow-gpu 2.0 and the 1.14. With the beta version I got problems with memory, while with the second one everything worked. No idea about the reason of that, but with the same exact code it worked. Giving up on the gpu boost is not the best idea imo.
@pgr2015 which tensorfow version you using? In case you haven't, please make sure all the memory is free ( using nvidia-smi ) before running the program.
@divamgupta Hi, sorry for the late reply, I am using tensorflow 1.11.0, and GPU memory is free.
@pgr2015 which tensorfow version you using? In case you haven't, please make sure all the memory is free ( using nvidia-smi ) before running the program.
Hi, which tensorfow version should be used? I am using tf 1.8.0, but GPU memory is free,so the training process is slow
yeah, without GPU , it becomes too slow (I use 1.14, as 1.11.0 doesn't work), so i wonder which tf verison you use @divamgupta.. thanks a lot
I also tried segnet_resnet50,pspnet_resnet50. they work well.
but FCN_8_resnet or FCN_32_resnet both didn't work
error message as follows:
2020-05-11 22:30:04.095153: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:880 : Resource exhausted: OOM when allocating tensor with shape[4096,2048,7,7] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "test_fcn.py", line 19, in
[[Mean/_3405]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[4096,2048,7,7] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Same issue even with pspnet_50