quickvision icon indicating copy to clipboard operation
quickvision copied to clipboard

Bug training CNNs with num_classes < 5

Open oke-aditya opened this issue 4 years ago • 9 comments

🐛 Bug

Describe the bug

I 'm training model with 2 class. Error appears at line 61 in metrics/accuracy.py. By default, maxk always equal 5 because topk is fixed (1,5) in train_step and val_step function. output variable shape is 32x4 so result in error RuntimeError: invalid argument 5: k not in range for dimension

     59     maxk = max(topk)
     60     batch_size = target.size(0)
     61     _, pred = output.topk(maxk, 1, True, True)
     62     pred = pred.t()
     63     correct = pred.eq(target.view(1, -1).expand_as(pred))

To Reproduce Steps to reproduce the behavior: Just train model using engine.fit with 2 classes

Expected behavior training process works with arbitrary number of output class

Screenshots image

Desktop (please complete the following information):

  • OS: ubuntu 20.04

Additional context

@vpeopleonatank

oke-aditya avatar Dec 02 '20 05:12 oke-aditya

A possible fix is to get number of classes from dataloader, and pass topk as

if dataloader.num_classes < 5: 
 acc1, acck =    topk(1, num_classes)
else:
   acc1, acck = topk(1, 5)

And subsequently change those metric logging below. Is there a way to get num_classes from dataloader ? Because we shouldn't introduce a new parameter to engine.

oke-aditya avatar Dec 02 '20 05:12 oke-aditya

Thanks for your reply. Current I also don't know how to get num_classes from dataloader. My temporary work is forking and hard-coding a new parameter. Hope you guys have flexible code for this.

vpeopleonatank avatar Dec 02 '20 15:12 vpeopleonatank

Yes definitely I will fix this up, this is very trivial bug. Once its fixed you can install from master :smile: and get the latest update

oke-aditya avatar Dec 02 '20 16:12 oke-aditya

train_transforms = T.Compose([T.ToTensor(), T.Normalize((0.5,), (0.5,))])
valid_transforms = T.Compose([T.ToTensor(), T.Normalize((0.5,), (0.5,))])

train_set = datasets.CIFAR10("./data", download=True, train=True, transform=train_transforms)
valid_set = datasets.CIFAR10("./data", download=True, train=False, transform=valid_transforms)

train_loader = DataLoader(train_set, 32, shuffle=True, num_workers=2)
valid_loader = DataLoader(valid_set, 32, shuffle=False, num_workers=1)

print(len(train_loader.dataset.classes))

With below code I could access number fo classes in data loader. Will it be possible for all dataloaders and just not CIFAR 10 ?

oke-aditya avatar Dec 07 '20 18:12 oke-aditya

In my current code, it would say AttributeError: 'Subset' object has no attribute 'classes', I think because my custom dataset doesn't have classes attribute. Should have the requirement for adding classes to the dataset class?

vpeopleonatank avatar Dec 07 '20 20:12 vpeopleonatank

AFAIK all the datasets available through torchvision.datasets have classes attribute defined for them, that's how you were available to do train_loader.dataset.classes. I will look at the source code and see how the classes attribute is defined, so that we can then somehow use that to get number of classes.

hassiahk avatar Dec 08 '20 02:12 hassiahk

Look at this part of the code: https://github.com/pytorch/vision/blob/f80b83ea298a49ddb4e5b4ce0fe59910beca70b4/torchvision/datasets/cifar.py#L95-L103

And also this: https://github.com/pytorch/vision/blob/f80b83ea298a49ddb4e5b4ce0fe59910beca70b4/torchvision/datasets/folder.py#L142-L158

So there is no definite way to get number of classes from DataLoader or Dataset since it depends on the classes attribute that we are defining in the Dataset object.

hassiahk avatar Dec 08 '20 02:12 hassiahk

We can't force end users to has self.classes attribute. Rather now I propose a parameter called metrics. Which users can pass.

oke-aditya avatar Dec 08 '20 09:12 oke-aditya

We need discussion on metrics parameter and how it should work, should It work for all models ? If yes then how.

oke-aditya avatar Dec 08 '20 09:12 oke-aditya