quickvision
quickvision copied to clipboard
Bug training CNNs with num_classes < 5
🐛 Bug
Describe the bug
I 'm training model with 2 class. Error appears at line 61 in metrics/accuracy.py
. By default, maxk
always equal 5 because topk
is fixed (1,5)
in train_step
and val_step
function. output
variable shape is 32x4
so result in error RuntimeError: invalid argument 5: k not in range for dimension
59 maxk = max(topk)
60 batch_size = target.size(0)
61 _, pred = output.topk(maxk, 1, True, True)
62 pred = pred.t()
63 correct = pred.eq(target.view(1, -1).expand_as(pred))
To Reproduce
Steps to reproduce the behavior:
Just train model using engine.fit
with 2 classes
Expected behavior training process works with arbitrary number of output class
Screenshots
Desktop (please complete the following information):
- OS: ubuntu 20.04
Additional context
@vpeopleonatank
A possible fix is to get number of classes from dataloader, and pass topk
as
if dataloader.num_classes < 5:
acc1, acck = topk(1, num_classes)
else:
acc1, acck = topk(1, 5)
And subsequently change those metric logging below.
Is there a way to get num_classes
from dataloader ? Because we shouldn't introduce a new parameter to engine.
Thanks for your reply. Current I also don't know how to get num_classes
from dataloader. My temporary work is forking and hard-coding a new parameter. Hope you guys have flexible code for this.
Yes definitely I will fix this up, this is very trivial bug. Once its fixed you can install from master :smile: and get the latest update
train_transforms = T.Compose([T.ToTensor(), T.Normalize((0.5,), (0.5,))])
valid_transforms = T.Compose([T.ToTensor(), T.Normalize((0.5,), (0.5,))])
train_set = datasets.CIFAR10("./data", download=True, train=True, transform=train_transforms)
valid_set = datasets.CIFAR10("./data", download=True, train=False, transform=valid_transforms)
train_loader = DataLoader(train_set, 32, shuffle=True, num_workers=2)
valid_loader = DataLoader(valid_set, 32, shuffle=False, num_workers=1)
print(len(train_loader.dataset.classes))
With below code I could access number fo classes in data loader. Will it be possible for all dataloaders and just not CIFAR 10 ?
In my current code, it would say AttributeError: 'Subset' object has no attribute 'classes'
, I think because my custom dataset doesn't have classes
attribute. Should have the requirement for adding classes
to the dataset class?
AFAIK all the datasets available through torchvision.datasets
have classes
attribute defined for them, that's how you were available to do train_loader.dataset.classes
. I will look at the source code and see how the classes
attribute is defined, so that we can then somehow use that to get number of classes.
Look at this part of the code: https://github.com/pytorch/vision/blob/f80b83ea298a49ddb4e5b4ce0fe59910beca70b4/torchvision/datasets/cifar.py#L95-L103
And also this: https://github.com/pytorch/vision/blob/f80b83ea298a49ddb4e5b4ce0fe59910beca70b4/torchvision/datasets/folder.py#L142-L158
So there is no definite way to get number of classes from DataLoader
or Dataset
since it depends on the classes
attribute that we are defining in the Dataset
object.
We can't force end users to has self.classes
attribute.
Rather now I propose a parameter called metrics
. Which users can pass.
We need discussion on metrics parameter and how it should work, should It work for all models ? If yes then how.