soft-moe
soft-moe copied to clipboard
Training hyperparameters
Hi is there any suggestion or guideline for the pretraining hyper parameters such as batch size, learning rate, optimiser etc. ? I plan to verify the efficacy of Soft-Moe on a relatively smaller dataset e.g., using ImageNet-1k on a smaller version of the ViT e.g. tiny.
Thank you