ritm_interactive_segmentation It takes long time to train

It takes long time to train

Open juntawu opened this issue 3 years ago • 17 comments

Hello. Thanks for you work. I trained HRNetV2-W18-C+OCR ITER-M model with command python3 train.py models/iter_mask/hrnet18_cocolvis_itermask_3p.py --gpus=0,1 --workers=6 --exp-name=first-try on COCO_LVIS dataset, with 2 GPUs (Tesla-V100-SXM2-32GB). However, it took me nearly 70+ hours to train 200 epochs. Is this normal ？

Apr 19 '21 03:04 juntawu

@juntawu 请问你训了多久呢，batch_size设置的多大，我这边也遇到了这个问题，我是按照代码中给的训练方式训练的，用了单卡，感觉训练的时间太久了。

May 10 '21 01:05 haoyuying

I trained hrnet18s on one 1080ti for 200 epochs. It took approximate 20mins per epoch. The result is lower than reported. I wonder if this is normal. 企业微信截图_20210623172604

Jun 23 '21 09:06 liyuxuan89

It's normal. https://github.com/saic-vul/ritm_interactive_segmentation/issues/3#issuecomment-811092310 I need 3 days to train 220 epochs. This is why the authors only trained 55 epochs for their experiments.

Oct 25 '21 20:10 qinliuliuqin

The patch size is set to 32 by default. To save time, you only need to train 55 epochs on COCO_LVIS as the authors did in their experiments.

Oct 25 '21 20:10 qinliuliuqin

Save checkpoint to experiments\iter_mask\sbd_hrnet18\000_first-try\checkpoints\last_checkpoint.pth Save checkpoint to experiments\iter_mask\sbd_hrnet18\000_first-try\checkpoints\000.pth 请问我在训练第一个epoch的时候，训练结束后就一直停在这个界面是正常的吗？就是会在这里停滞很久是吗？我也不敢去乱点。

Nov 11 '21 14:11 ty199931

训练结束后会做validation，你可以去过一遍代码，这个代码写得很好。validation的时候会停顿下，但不会很久，而且会有进度条显示。

Nov 13 '21 16:11 qinliuliuqin

我看了代码了，然后也挨个代码打断点找问题，发现他有的时候连for循环都进不去，如果你们都没问题的话，那可能是我的电脑的原因？或者我的数据集有问题？

Nov 14 '21 00:11 ty199931

请问你们有每个epoch训练loss都重新开始的问题么，感觉每个epoch都是独立的

Jan 04 '22 14:01 hyalvin

hello, may i ask how you get this results? My validation process only gives me the validation loss result.

Jan 04 '22 14:01 hyalvin

请问后来这个问题是怎么解决的呢，我训练自己的数据集也遇到了同样的问题，训练完第一个epoch到validation就卡死了

Feb 10 '22 06:02 xiangyunfan

请问这个问题后来你怎么解决的呢？

May 07 '22 09:05 yangshunDragon

请问这个问题后来你怎么解决的呢？

May 07 '22 09:05 yangshunDragon

请问这个问题后来你怎么解决的呢？

May 07 '22 09:05 yangshunDragon

好像是因为影像的原因，把没有标签的剔除掉就可以了

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年5月7日(星期六) 下午5:42 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [saic-vul/ritm_interactive_segmentation] It takes long time to train (#5)

@juntawu 请问你训了多久呢，batch_size设置的多大，我这边也遇到了这个问题，我是按照代码中给的训练方式训练的，用了单卡，感觉训练的时间太久了。

The patch size is set to 32 by default. To save time, you only need to train 55 epochs on COCO_LVIS as the authors did in their experiments.

请问后来这个问题是怎么解决的呢，我训练自己的数据集也遇到了同样的问题，训练完第一个epoch到validation就卡死了

请问这个问题后来你怎么解决的呢？

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

May 07 '22 11:05 ty199931

没有标签是指有原始图像 images/sth.jpg 但是没有对应的掩膜masks/sth.png么？

我的情况是使用3D的医学图像切片做的训练数据集，每个原始图像images/sth.jpg都有对应的masks/sth.png图像，但是mask图像有一定比例是纯黑的（mask图像内没有目标）

May 09 '22 01:05 yangshunDragon

@yangshunDragon 你好，我也是想用这个模型做一下医学图像分割，想请问一下您这个问题解决了吗

Jul 11 '22 13:07 chuyhu

就是把那些mask纯黑的删掉就行

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年5月9日(星期一) 上午9:31 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [saic-vul/ritm_interactive_segmentation] It takes long time to train (#5)

好像是因为影像的原因，把没有标签的剔除掉就可以了 … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年5月7日(星期六) 下午5:42 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [saic-vul/ritm_interactive_segmentation] It takes long time to train (#5) @juntawu 请问你训了多久呢，batch_size设置的多大，我这边也遇到了这个问题，我是按照代码中给的训练方式训练的，用了单卡，感觉训练的时间太久了。 The patch size is set to 32 by default. To save time, you only need to train 55 epochs on COCO_LVIS as the authors did in their experiments. Save checkpoint to experiments\iter_mask\sbd_hrnet18\000_first-try\checkpoints\last_checkpoint.pth Save checkpoint to experiments\iter_mask\sbd_hrnet18\000_first-try\checkpoints\000.pth 请问我在训练第一个epoch的时候，训练结束后就一直停在这个界面是正常的吗？就是会在这里停滞很久是吗？我也不敢去乱点。请问后来这个问题是怎么解决的呢，我训练自己的数据集也遇到了同样的问题，训练完第一个epoch到validation就卡死了请问这个问题后来你怎么解决的呢？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

没有标签是指有原始图像 images/.jpg 但是没有对应的掩膜masks/.png么？

我的情况是使用3D的医学图像切片做的训练数据集，每个原始图像images/.jpg都有对应的masks/.png图像，但是mask图像有一定比例是纯黑的（没有目标mask）

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Oct 11 '22 07:10 ty199931

ritm_interactive_segmentation ritm_interactive_segmentation copied to clipboard

It takes long time to train

ritm_interactive_segmentation
ritm_interactive_segmentation copied to clipboard