FewShotWithoutForgetting icon indicating copy to clipboard operation
FewShotWithoutForgetting copied to clipboard

Error when training

Open xwjabc opened this issue 6 years ago • 7 comments

When executing the command below: CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifier

It prompts:

Exception KeyError: KeyError(<weakref at 0x7f619db132b8; to 'tqdm' at 0x7f619db23090>,) in <bound method tqdm.__del__ of
  0%|                                                                 | 0/2000 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    algorithm.solve(dloader_train, dloader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 286, in solve
    eval_stats = self.evaluate(data_loader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 330, in evaluate
    eval_stats_this = self.evaluation_step(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 84, in evaluation_ste
p
    return self.process_batch(batch, do_train=False)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 87, in process_batch
    process_type = self.set_tensors(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 60, in set_tensors
    nKnovel = 1 + labels_train.max() - self.nKbase
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'other'

Environment: Python 2.7 PyTorch 0.4 @ CUDA 9.1

xwjabc avatar Jul 13 '18 09:07 xwjabc

@xwjabc I met the same problem and I'm not familiar with pytorch. But change this line https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/FewShot.py#L55 to self.nKbase = nKbase.squeeze()[0].cuda() fix the problem.

caiqi avatar Jul 15 '18 06:07 caiqi

@caiqi Thx! Will take a look. I am also a newbie to PyTorch and trying to trace the reason of that error.

xwjabc avatar Jul 15 '18 06:07 xwjabc

Got the reason. In PyTorch 0.4, x.squeeze()[0] will not return a scalar, but a tensor. It will cause several compatibility problems (e.g. nKbase errors, DAverageMeter errors). Will post a patch list later.

xwjabc avatar Jul 17 '18 06:07 xwjabc

@xwjabc I met possibly the same DAverageMeter error (AccuracyNovel is missing). Could you please tell me how to fix it?

jin-s13 avatar Oct 23 '18 15:10 jin-s13

@jin-s13 Could you add some more details for the error information?

xwjabc avatar Oct 23 '18 21:10 xwjabc

@jin-s13 My suggestion; if you are still interested, is you should add .item() at the end of top1accuracy() function whenever you calculate Accuracies for Novel, Base or Both this will turn the loss_record into a scalar for the aforementioned accuracies

bugrabaran avatar Jan 09 '19 15:01 bugrabaran

Here is my solution:

#labels_train = self.tensors['labels_train']

nKnovel = 1 + labels_train.max() - self.nKbase

labels_train_1hot_size = list(labels_train.size()) + [nKnovel,] labels_train_unsqueeze = labels_train.unsqueeze(dim=labels_train.dim()) self.tensors['labels_train_1hot'].resize_(labels_train_1hot_size).fill_(0).scatter_( len(labels_train_1hot_size) - 1, (labels_train_unsqueeze - self.nKbase).cuda(), 1)

Franklin-Yao avatar Nov 11 '19 09:11 Franklin-Yao