Alan Q. Wang

Results 1 comments of Alan Q. Wang

Digging deeper, it seems that setting lr=1e-3 and wd=1e-4 is necessary to get reasonable results (i.e. non-degenerate solutions). It seems these are the hyperparameters recommended by most papers using this...