DSB2017 icon indicating copy to clipboard operation
DSB2017 copied to clipboard

关于detector的train

Open Carl-Lei opened this issue 7 years ago • 10 comments

我在python3.6训练时出现错误 Traceback (most recent call last): File "main.py", line 349, in main() File "main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in iter return _DataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 247, in init self._put_indices() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 295, in _put_indices indices = next(self.sample_iter, None) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 138, in iter for idx in self.sampler: File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 51, in iter return iter(torch.randperm(len(self.data_source)).tolist()) TypeError: 'float' object cannot be interpreted as an integer 不知道这里的‘float’指的时哪个变量?这种情况下,怎么改啊?

Carl-Lei avatar Jul 13 '18 06:07 Carl-Lei

Dataparalle 包住model的情况下没法debug,你把dataparallel 去掉

On 13 Jul 2018, at 2:44 PM, Carl-Lei <[email protected] mailto:[email protected]> wrote:

我在python3.6训练时出现错误 Traceback (most recent call last): File "main.py", line 349, in main() File "main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in iter return _DataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 247, in init self._put_indices() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 295, in _put_indices indices = next(self.sample_iter, None) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 138, in iter for idx in self.sampler: File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 51, in iter return iter(torch.randperm(len(self.data_source)).tolist()) TypeError: 'float' object cannot be interpreted as an integer 不知道这里的‘float’指的时哪个变量?这种情况下,怎么改啊?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lfz/DSB2017/issues/94, or mute the thread https://github.com/notifications/unsubscribe-auth/AIigQ7dalUmhcp1GheAE0xwWDuyfA0zbks5uGEHngaJpZM4VOYXr.

lfz avatar Jul 13 '18 11:07 lfz

我把第99行的net = DataParallel(net)注释掉,还是不行啊,报同样的错误

Carl-Lei avatar Jul 16 '18 01:07 Carl-Lei

@lfz

Carl-Lei avatar Jul 16 '18 01:07 Carl-Lei

这个问题好像解决了。是因为在DataBowl3Detector的类里面 def len(self): if self.phase == 'train': return len(self.bboxes)/(1-self.r_rand) 这里是要return一个整数吗? @lfz

Carl-Lei avatar Jul 16 '18 08:07 Carl-Lei

@lfz 这一句为什么会报错啊?input_size=(128,128,128), stride=4 这个取余的判断怎么会是False呢? Traceback (most recent call last): File "D:/mydsb/dsb_test/training/detector/main.py", line 349, in main() File "D:/mydsb/dsb_test/training/detector/main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "D:/mydsb/dsb_test/training/detector/main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 272, in next return self._process_next_batch(batch) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) AssertionError: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 57, in samples = collate_fn([dataset[i] for i in batch_indices]) File "D:\mydsb\dsb_test\training\detector\data.py", line 96, in getitem label = self.label_mapping(sample.shape[1:], target, bboxes) File "D:\mydsb\dsb_test\training\detector\data.py", line 273, in call assert(int(input_size[i])% stride == 0) AssertionError

Carl-Lei avatar Jul 16 '18 08:07 Carl-Lei

def len(self):
if self.phase == 'train':
return len(self.bboxes)/(1-self.r_rand)

解决了吗,我也有这个问题,我把/改成//,好像也不行

shenlinyao avatar Jul 21 '18 07:07 shenlinyao

@shenlinyao 我是强行转换为int类型的 return int(len(self.bboxes)/(1-self.r_rand))

Carl-Lei avatar Jul 23 '18 00:07 Carl-Lei

@Carl-Lei 同样遇到了“assert(int(input_size[i])% stride == 0) AssertionError”的问题,请问你解决了这个问题了吗?

DaLei001 avatar Nov 13 '18 06:11 DaLei001

@DaLei001 请问 config里面的'luna_segment':'/work/DataBowl3/luna/seg-lungs-LUNA16/'的路径是什么?需要一些额外的数据吗?还是直接建立一个空的文件?

lihaossu avatar Nov 19 '18 06:11 lihaossu

@DaLei001 请问 config里面的'luna_segment':'/work/DataBowl3/luna/seg-lungs-LUNA16/'的路径是什么?需要一些额外的数据吗?还是直接建立一个空的文件?

这是LUNA的一个文件夹

chenggangdu avatar Oct 24 '19 10:10 chenggangdu