Max Ma

Results 5 issues of Max Ma

For the data used for POS tagging and Dependency Parsing, our data format follows the CoNLL-X format. Following is an example: 1 No _ RB RB _ 7 discourse _...

For the experiments of language model (LM) on One Billion Words, the final test PPL with Adam and RAdam are around 41 and 40, respectively, worse than the numbers reported...

Hi, Thanks for this great work. Recently, we tried to train ResNext-50 on ImageNet classification using AdaHessian. The implementation we used is from https://github.com/davda54/ada-hessian. However, I got some wired observations....

Hi, I want to ask the purpose of the learntop arguments. I found that this argument is only used in the following code: ```python def prior(name, y_onehot, hps): with tf.variable_scope(name):...

Hi, I was wondering what are the benchmarks you used for testing these optimizers? Thanks!