pmf_cvpr22 icon indicating copy to clipboard operation
pmf_cvpr22 copied to clipboard

Several code and traning questions

Open cszer opened this issue 2 years ago • 6 comments

Hi, thanks for awesome and useful paper. I am trying to train other backbone on 10 meta-dataset My questions are:

  1. What about batch size? For vit it is ok to use bs=1 due to layer norm and lagre memory consumption. Will this degrade teoretical performance of cnns with simple batch norm layers?
  2. Ram usage is increasing over training steps , is it normal?

cszer avatar Mar 14 '23 22:03 cszer

Hi @cszer, thanks for your questions!

  1. bs=1 means we only have 1 episode at a time, which contains a support set of images and a query set of images. So the input to ViT is never bs=1, but bs=len(support set) or len(query set).
  2. RAM usage is changing all the time due to different support/query set sizes.

hushell avatar Mar 15 '23 21:03 hushell

About ram, I add some changes to dataset (close all h5 using tables after training tensors are created + deepcopy to some variables) -> ram usage now not more than 29 gb (before it was around 45)

cszer avatar Mar 16 '23 10:03 cszer

Thanks for answers!

cszer avatar Mar 16 '23 10:03 cszer

That sounds a good fix. Could you send a pull request so that I can merge your code? Cheers!

hushell avatar Mar 16 '23 10:03 hushell

I encountered the same problem, the program terminated because the cpu was running out of RAM, how can I solve it?

codeshop715 avatar May 06 '23 12:05 codeshop715

@codeshop715 Unfortunately the ViT models were not optimized and it requires a 48G GPU for training Meta-Dataset. There is a trick to reduce memory is stopping grad on the ViT for Support Set.

hushell avatar Jun 14 '23 23:06 hushell