giladsharir

Results 3 comments of giladsharir

Hi, thanks taking interest in this work. The training hyper-parameters are (for stam_16) batch size 64, AdamW optimizer with weight decay 1e-3, 100 epochs with cosine annealing schedule and learning...

Can you provide more information: what dataset are you training? what accuracy did you reach? what training hyper-parameters? etc. We currently don't have an install package option

If the dataset contains many classes (>10K ) you can try increasing the ```num-of-groups``` argument to 200 or 300, (current default value is 100). Otherwise you can look at the...