pmf_cvpr22 icon indicating copy to clipboard operation
pmf_cvpr22 copied to clipboard

Issue with cuda.amp.autocast() during evaluation?

Open Aryan9101 opened this issue 1 year ago • 2 comments

I noticed that you guys have a torch.cuda.amp.autocast() in the _evaluate function (line 143) in engine.py. In my experience, autocasting is typically used in conjunction with a GradScaler as well. Especially, given that said autocasting could be wrapping around the forward pass of a ProtoNet_Finetune or ProtoNet_Auto_Finetune model, which by itself, also includes multiple forward and backward passes (and PyTorch recommends not wrapping the latter with autocasting) through the backbone because of the final fine-tuning during the meta-testing stage.

I ran two evaluations with the base DINO architecture (off-the-shelf and not meta-trained) on 100 5-way-5-shot classification tasks from the test split of CIFAR-FS, one with that autocasting line commented out and with it as present in your codebase, and I noticed very different results. Here is the command I used by the way: python main.py --dataset cifar_fs --arch dino_small_patch16 --device cuda:0 --nSupport 5 --deploy finetune --eval --nEpisode 100 --ada_lr 0.001 --ada_steps 50

When autocasting was commented out, the global average accuracy was around ~40.9% but with it present, it was around ~86.3%. Screenshot 2023-07-20 at 3 09 58 PM Screenshot 2023-07-20 at 3 19 10 PM

I also ran two evaluations where I let auto LR selection happen (as highlighted in your paper) instead of picking a learning rate myself, with this command: python main.py --dataset cifar_fs --arch dino_small_patch16 --device cuda:0 --nSupport 5 --deploy finetune_autolr --eval --nEpisode 100 --ada_steps 50

With autocasting commented out, a non-zero learning rate was never picked so the final accuracy reported for each task was after protonet evaluation with the frozen backbone embeddings (this would essentially be equivalent to running the first command but with ada_lr=0) --- this yielded a final accuracy of ~78.6%. However, with autocasting present, as in line 143, non-zero learning rates did end up getting picked and the final accuracy was around ~84%.

I am just curious about this behavior. Admittedly, I am not too familiar with automatic mixed precision myself so I could totally be missing something here. Any thoughts on why including/removing torch.cuda.amp.autocast() could lead to such a drastic change?

Aryan9101 avatar Jul 20 '23 20:07 Aryan9101