Max Ma issues

Results 5 issues of


                                            Max Ma

Data Format

For the data used for POS tagging and Dependency Parsing, our data format follows the CoNLL-X format. Following is an example: 1 No _ RB RB _ 7 discourse _...

Cannot reproduce the PPL on One Billion Words

For the experiments of language model (LM) on One Billion Words, the final test PPL with Adam and RAdam are around 41 and 40, respectively, worse than the numbers reported...

Wired behaviors of AdaHessian on ResNext-50

Hi, Thanks for this great work. Recently, we tried to train ResNext-50 on ImageNet classification using AdaHessian. The implementation we used is from https://github.com/davda54/ada-hessian. However, I got some wired observations....

The purpose of learntop argument

Hi, I want to ask the purpose of the learntop arguments. I found that this argument is only used in the following code: ```python def prior(name, y_onehot, hps): with tf.variable_scope(name):...

What are the benchmarks you used for testing?

Hi, I was wondering what are the benchmarks you used for testing these optimizers? Thanks!