ssd.pytorch
ssd.pytorch copied to clipboard
When I run train.py for about 40 iters, it got an error and the program break
File "train.py", line 255, in train() File "train.py", line 165, in train images, targets = next(batch_iterator) File "/home/chenzw/anaconda3/envs/tensor3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 271, in next raise StopIteration StopIteration
Wow,another error
File "train.py", line 255, in
I have same problem
iter 510 || Loss: 8.8698 || Traceback (most recent call last):
File "train.py", line 257, in
Because when then iter run over one whole epoch, it can't go on iterating from the start point automatically.
here solved this problem
The reason is as ShoufaChen explained.
And I notice that there is another way to handle this problem. I train this network in VOC datasets and I read the config.py in data folder,it expected to iter 120000 times. For the entire datasets it can iter iter_datasets = len(dataset) / batchSize times,to achieve our goal,say,120000,we need to repeat epoch_size = 120000 / iter_datasets,In order to simplify the design, I change the code like this:
iter_datasets = len(dataset) // args.batch_size
epoch_size = cfg['max_iter'] // iter_datasets
for epoch in range(0, epoch_size):
for i_batch, (images, targets) in enumerate(data_loader):
And notice that this would break down your visdom output and I just turn off visdom,Another thing you should notice is that you should change your code to print infomation(loss, accuracy, time...) that you are interested in.Ok, for visdom,I also notice that you need to have a globle viz at the beginning of train() function for your draw code to use viz if you flag --visdom True .
Hope this helps!
'''''''''''''''''''''''''''''''''''''''''''''' zhuyu72 you can copy another train.py(maybe your_train.py) and change some code like this:
iter_datasets = len(dataset) // args.batch_size
epoch_size = cfg['max_iter'] // iter_datasets
for iteration in range(0, epoch_size):
for i_batch, (images, targets) in enumerate(data_loader):
if args.visdom and iteration != 0 and (iteration % epoch_size == 0):
update_vis_plot(epoch, loc_loss, conf_loss, epoch_plot, None,
'append', epoch_size)
# reset epoch loss counters
loc_loss = 0
conf_loss = 0
where should i place the code? @HosinPrime
what I changed has solved this problem.
You can change the code in
for iteration in range(args.start_iter, cfg['max_iter']):
use try...except to judge if next() rease StopIteration,and in except reload data.
I made the following revise:
- epoch_size = len(data_loader) instead of len(dataset) // args.batch_size
- add the try clause when load train data: try: images, targets = next(batch_iterator) except StopIteration: batch_iterator = iter(data_loader) images, targets = next(batch_iterator) except Exception as e: print("Loading data Exception:", e)
@HosinPrime did you run the train.py? why my loss always between 0.5-0.8,
epoch:1/766 loss:0.0582 spend time:118.05 epoch:2/766 loss:0.0560 spend time:238.11 epoch:3/766 loss:0.0545 spend time:356.86 epoch:4/766 loss:0.0486 spend time:476.96 epoch:5/766 loss:0.0508 spend time:596.04 epoch:6/766 loss:0.0613 spend time:716.75 epoch:7/766 loss:0.0544 spend time:836.07 epoch:8/766 loss:0.0470 spend time:956.83 epoch:9/766 loss:0.0610 spend time:1076.16 epoch:10/766 loss:0.0708 spend time:1196.66 epoch:11/766 loss:0.0707 spend time:1317.04 epoch:12/766 loss:0.0698 spend time:1436.10 epoch:13/766 loss:0.0508 spend time:1556.46 epoch:14/766 loss:0.0824 spend time:1675.38 epoch:15/766 loss:0.0627 spend time:1795.84 epoch:16/766 loss:0.0748 spend time:1914.88 epoch:17/766 loss:0.0557 spend time:2035.31 epoch:18/766 loss:0.0653 spend time:2154.38 epoch:19/766 loss:0.0684 spend time:2274.93 epoch:20/766 loss:0.0701 spend time:2394.18 epoch:21/766 loss:0.0529 spend time:2514.80 epoch:22/766 loss:0.0538 spend time:2634.17 epoch:23/766 loss:0.0456 spend time:2754.80 epoch:24/766 loss:0.0572 spend time:2875.39 epoch:25/766 loss:0.0626 spend time:2994.57 epoch:26/766 loss:0.0653 spend time:3115.05 epoch:27/766 loss:0.0566 spend time:3234.57 epoch:28/766 loss:0.0501 spend time:3355.59 epoch:29/766 loss:0.0527 spend time:3475.41
@HosinPrime 你好,我在修改了batch_size,并且没有修改网络的情况下进行训练,但是训练得到的损失很差,请问这是什么原因?
@HosinPrime 你好,我在修改了batch_size,并且没有修改网络的情况下进行训练,但是训练得到的损失很差,请问这是什么原因? 可能是batch_size太小了 影响训练结果
@HosinPrime 你好,我在修改了batch_size,并且没有修改网络的情况下进行训练,但是训练得到的损失很差,请问这是什么原因? 可能是batch_size太小了 影响训练结果
我将batch_size修改为了16进行训练,损失还是无法减小