"CUDA out of memory." at training.
Hi! I try to train with this command.(At my windows PC with RTX2070)
F:\Users\sounansu\Anaconda3\FCHarDNet>\python train.py --config configs\hardnet.yml
.....
RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 8.00 GiB total capacity; 5.98 GiB already allocated; 24.97 MiB free; 30.09 MiB cached)
Please teach me how to modify hardnet.yml!
Hello, thanks for reaching out. The current config requires 20GB GPU memory for training. For single GPU training with 11~12GB memory, you can try modifying the image resolution from [1024, 1024] to [768, 768], which may compromise the mIoU down to ~0.76 (val) however. Please note that you will need to modify img_rows, img_cols, and rscale_crop.
Thank you @PingoLH ! I will try to train with modified hardnet.yml. And, will report mIoU value with that parameter.
I changed hardnet.yml as
Iter [500/90000] Loss: 1.2908 Time/Image: 0.0273 lr=0.019900 INFO:ptsemseg:Iter [500/90000] Loss: 1.2908 Time/Image: 0.0273 lr=0.019900 1it [00:15, 15.10s/it]Traceback (most recent call last): File "train.py", line 267, intrain(cfg, writer, logger) File "train.py", line 186, in train outputs = model(images_val) File "F:\Users\sounansu\Anaconda3New\envs\FCHarDNet\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "F:\Users\sounansu\Anaconda3New\envs\FCHarDNet\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "F:\Users\sounansu\Anaconda3New\envs\FCHarDNet\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "F:\Users\sounansu\Anaconda3New\FCHarDNet\ptsemseg\models\hardnet.py", line 186, in forward out = self.transUpBlocks[i](out, skip, True) File "F:\Users\sounansu\Anaconda3New\envs\FCHarDNet\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "F:\Users\sounansu\Anaconda3New\FCHarDNet\ptsemseg\models\hardnet.py", line 99, in forward out = torch.cat([out, skip], 1) RuntimeError: CUDA out of memory. Tried to allocate 1008.00 MiB (GPU 0; 8.00 GiB total capacity; 4.28 GiB already allocated; 796.97 MiB free; 998.90 MiB cached) 1it [00:19, 19.59s/it]
Out of memory was occurred.
I modified hardnet.yml and train.py as below
diff --git a/configs/hardnet.yml b/configs/hardnet.yml
index e3e14a6..bfac3bf 100644
--- a/configs/hardnet.yml
+++ b/configs/hardnet.yml
@@ -4,10 +4,10 @@ data:
dataset: cityscapes
train_split: train
val_split: val
- img_rows: 1024
- img_cols: 1024
- path: /mnt/ssd2/Cityscapes/
- sbd_path: /mnt/ssd2/Cityscapes/
+ img_rows: 512
+ img_cols: 512
+ path: F:\image_data\Cityscape\leftImg8bit
+ sbd_path: F:\image_data\Cityscape\leftImg8bit
training:
train_iters: 90000
batch_size: 16
@@ -16,7 +16,7 @@ training:
print_interval: 10
augmentations:
hflip: 0.5
- rscale_crop: [1024, 1024]
+ rscale_crop: [512, 512]
optimizer:
name: 'sgd'
lr: 0.02
diff --git a/train.py b/train.py
index 172e917..57746e6 100644
--- a/train.py
+++ b/train.py
@@ -57,7 +57,7 @@ def train(cfg, writer, logger):
data_path,
is_transform=True,
split=cfg["data"]["val_split"],
- img_size=(1024,2048),
+ img_size=(cfg["data"]["img_rows"], cfg["data"]["img_cols"]),
)
n_classes = t_loader.n_classes
and train again.
I measure validation.
(FCHarDNet) F:\Users\sounansu\Anaconda3New\FCHarDNet>python validate.py --config configs\hardnet.yml --model_path runs\hardnet\cur\hardnet_cityscapes_best_model.pkl .... Total Frame Rate = 33.85 fps Overall Acc: 0.9560427663307451 Mean Acc : 0.8086355461508691 FreqW Acc : 0.9193348521709037 Mean IoU : 0.7240548654125439 0 0.9793078776886329 1 0.8371592154910068 2 0.918241383070455 3 0.566010215830134 4 0.579013853663639 5 0.6087141064956788 6 0.6494385999487208 7 0.7526571660539968 8 0.9192169494945195 9 0.6232992244377927 10 0.9399414673242146 11 0.7892767003086372 12 0.5512253643936255 13 0.9420434722036333 14 0.6960866120115173 15 0.7649974688063472 16 0.41760177053657616 17 0.4881134901126577 18 0.7346975049665497
Hi sounansu, thank you so much for the feedback and report. You can also try a smaller batch size with a higher resolution if you are interested. Also, I'll recommend keeping the full resolution for v_loader with a smaller batch size such that the image will not be distorted during validation. Thanks!
Thank you another advice.
So. I modified batch size as below.
- batch_size: 16 + batch_size: 4
Validation values are
Total Frame Rate = 34.36 fps Overall Acc: 0.9550504199267023 Mean Acc : 0.8234178483937832 FreqW Acc : 0.9176348176979007 Mean IoU : 0.728196572228794 0 0.9797917075564525 1 0.8329137830933465 2 0.9146849932283128 3 0.5196456033824945 4 0.5477201100625065 5 0.6087271166747225 6 0.6399322002349301 7 0.7507402719029521 8 0.9204021515861227 9 0.6014830162332917 10 0.9410106909056067 11 0.7900416534638349 12 0.5772749706560611 13 0.9361805056874339 14 0.6014725882823072 15 0.7831162906176867 16 0.6354351352905586 17 0.5277508688827295 18 0.7274112146057334It is little better mIoU than older! Thank you!
您好sounansu,非常感谢您的反馈和报告。如果您有兴趣,也可以尝试使用较小的批次大小和较高的分辨率。另外,我建议使用较小的批处理大小保留v_loader的完整分辨率,以使图像在验证期间不会失真。谢谢!
Could you please tell me the configuration version of scipy package for training?In a Linux environment?How about under Windows?
1024,2048
Please check out your version of Windows pytorch and other package versions