fixmatch icon indicating copy to clipboard operation
fixmatch copied to clipboard

Really performs worse when number of classes are High

Open jaytimbadia opened this issue 3 years ago • 7 comments

This algorithm really performs worse when number of classes are high like around 100, which is most of the time the case. At that time, we need more images per class and as we increase labels it almost reaches to normal models image per class requirement.

Really wasted my time on this.

jaytimbadia avatar Feb 10 '22 05:02 jaytimbadia

That's interesting you find it's not working. We were able to reach ~state of the art accuracy at the time on semi supervised CIFAR-100 (with 100 classes) and ImageNet (with 1000 classes). What are you trying?

carlini avatar Feb 10 '22 05:02 carlini

Hi,

I guess it performs well or SOTA on Semi SL benchmarks not conventional with more label data per class. Please let me know if above is correct. Bcs Cifar100 has 96% on SL and I am not sure what accuracy this mode achieves on Cifar 100 and Imagenet 1000.

I am working on a classification task with ~2000 classes, I tried with 400 images per class which gave me an supervised accuracy of 48.85%. If this is worse, how can the pseudo labelling training task perform well? and due to this unlabelled training is not performing well. This means that, we need a quite better model (complex) to train the further semi task and repeat.

I am not sure how to tackle this problem I have. I don't have much labelled data (400 per class) and have been stuck for 2 months now.

jaytimbadia avatar Feb 10 '22 05:02 jaytimbadia

Sorry I don't really follow what you're saying here.

You have 2000 classes, and with 400 images per class you get 48.8% accuracy. Is this with FixMatch? "If this is worse" than what? And how does pseudo labeling enter into here? And why do you need a more complex model?

How much unlabeled data do you have if you have 400 labeled images per class?

carlini avatar Feb 10 '22 06:02 carlini

OK.Let me put it simply.

I have 2000 classes with 400 labelled images per class and I have ~600 non labelled images per class (some from net other augmented from labelled) Let me know will fix match work on this? I have tried and it doesn;t give good results. Let me know if you have any other suggestion.

I got 48.8% on running fixmatch.py file. DO I need to do any model changes? Also why it wont work on higher classes? Is it due to the more patterns and less complex model? SO should change architecture?

jaytimbadia avatar Feb 10 '22 07:02 jaytimbadia

What accuracy does fully supervised training on these images give, ignoring the unlabeled data?

carlini avatar Feb 10 '22 17:02 carlini

I did not did that, but now I have it. Its gives 84.27% with 15-85 split. Pure supervised on labelled.

I have used transfer learning - Dense Net.

jaytimbadia avatar Feb 14 '22 12:02 jaytimbadia

Supposing you don't use transfer learning, what accuracy do you get?

The thing I'm trying to understand is this: supervised learning will strictly out-perform semi-supervised learning given the same number of labeled examples. If your task is just hard, it's entirely reasonable that FixMatch might just actually not reach very high accuracy.

carlini avatar Feb 25 '22 06:02 carlini