Training hyperparameters

Open jacoblam3112 opened this issue 1 year ago • 0 comments

Hi is there any suggestion or guideline for the pretraining hyper parameters such as batch size, learning rate, optimiser etc. ? I plan to verify the efficacy of Soft-Moe on a relatively smaller dataset e.g., using ImageNet-1k on a smaller version of the ViT e.g. tiny.

Thank you

Sep 11 '24 07:09 jacoblam3112