Hi all,
I was training the model with the Pannuke dataset. The following problem occurs and I don't understand whether it is a problem with the data or the code. Here are the specific problems that occur. Thanks very much!
Processing: | | 0/11[00:00<?,?it/s]
Processing: |9 | 1/11[00:03<00:30, 3.10s/it]
Processing: |#8 | 2/11[00:04<00:16, 1.82s/it]
Processing: |##7 | 3/11[00:05<00:11, 1.45s/it]
Processing: |###6 | 4/11[00:06<00:08, 1.28s/it]
Processing: |####5 | 5/11[00:06<00:06, 1.14s/it]
Processing: |#####4 | 6/11[00:08<00:05, 1.13s/it]
Processing: |######3 | 7/11[00:09<00:04, 1.14s/it]
Processing: |#######2 | 8/11[00:10<00:03, 1.10s/it]
Processing: |########1 | 9/11[00:11<00:02, 1.12s/it]
Processing: |######### | 10/11[00:12<00:01, 1.01s/it]
Processing: |##########| 11/11[00:13<00:00, 1.01it/s]
Processing: |##########| 11/11[00:13<00:00, 1.20s/it]
Traceback (most recent call last):
File "run_train.py", line 318, in
trainer.run()
File "run_train.py", line 300, in run
phase_info, engine_opt, save_path, prev_log_dir=prev_save_path
File "run_train.py", line 275, in run_once
main_runner.run(opt["nr_epochs"])
File "/home/liable/hover_net-master/run_utils/engine.py", line 197, in run
self.__trigger_events(Events.EPOCH_COMPLETED)
File "/home/liable/hover_net-master/run_utils/engine.py", line 123, in __trigger_events
callback.run(self.state, event)
File "/home/liable/hover_net-master/run_utils/callbacks/base.py", line 70, in run
chained=True, nr_epoch=self.nr_epoch, shared_state=state
File "/home/liable/hover_net-master/run_utils/engine.py", line 197, in run
self.__trigger_events(Events.EPOCH_COMPLETED)
File "/home/liable/hover_net-master/run_utils/engine.py", line 123, in __trigger_events
callback.run(self.state, event)
File "/home/liable/hover_net-master/run_utils/callbacks/base.py", line 213, in run
track_dict = self.proc_func(raw_data)
File "/home/liable/hover_net-master/models/hovernet/opt.py", line 139, in
lambda a: proc_valid_step_output(a, nr_types=nr_type)
File "/home/liable/hover_net-master/models/hovernet/run_desc.py", line 290, in proc_valid_step_output
patch_prob_np = prob_np[idx]
IndexError: list index out of range
Not sure of your exact setup but that error often happens due to batch_size of the last step being 1. Try to ensure that batch_size is always > 1.
https://github.com/vqdang/hover_net/issues/103#issuecomment-798997624
Thank you vqdang, that really helps.
Also, when I switch to a larger dataset, I get the following bug when training the model with the same hyperparameters. the label of the data have been checked and there are no errors. I would like to know if this bug is also caused by the inappropriate hyperparameters.
----------------EPOCH 1
Processing: | | 0/332[00:00<?,?it/s]Batch = nan|EMA = nanTraceback (most recent call last):
File "run_train.py", line 318, in
trainer.run()
File "run_train.py", line 300, in run
phase_info, engine_opt, save_path, prev_log_dir=prev_save_path
File "run_train.py", line 275, in run_once
main_runner.run(opt["nr_epochs"])
File "/home/jumengwei/hover_net-master/run_utils/engine.py", line 182, in run
step_output = self.run_step(data_batch, step_run_info)
File "/home/jumengwei/hover_net-master/models/hovernet/run_desc.py", line 54, in train_step
true_tp_onehot = F.one_hot(true_tp, num_classes=model.module.nr_types)
RuntimeError: Class values must be smaller than num_classes.
Processing: | | 0/332[00:01<?,?it/s]Batch = nan|EMA = nan