Max Ma comments

Results 20 comments of


                                            Max Ma

Can someone explain why restricting the posterior `z` as diagonal Gaussian?

This is the initialization to ensure the posterior distribution as a normal. During training, the posterior distribution will become more and more complex when we update the parameters.

Any changes possible to reduce GPU memory usage?

Hi, For a "large: model with around 3 billion parameters, I guess the optimizer is probably not the bottleneck of memory comparing with gradient calculation in back-propagation. Can I ask...

Any changes possible to reduce GPU memory usage?

Thanks for the updates! If I understand correctly, storing the parameters together with the optimizer states is indeed the bottleneck of memory. Since apollo has one more state (3 vs....

Any changes possible to reduce GPU memory usage?

Please let me know if you find apollo obtains better results on the large model. Thanks!

The purpose of learntop argument

Thanks a lot for your reply. It is pretty clear! On Wed, Dec 5, 2018 at 8:20 AM Huadong Liao wrote: > Hope my explanation will help you: > >...

Data Format

"DOCSTART" in my data sets is placed in a separated sentence, like 1 -DOCSTART- -X- O O But as it provide no useful information, you can remove it from your...

Data Format

@nrasiwas sorry for late response. Here is a more clear example of the data format. The following is the correct format for your examples: 1 EU NNP I-NP I-ORG 2...

Data Format

The second column is reserved for lemma, the same as conllu. But our model does not use lemma information. So the second column can be filled with any thing. Our...

Data Format

Hi, the data is under PTB licence. If it is not an issue, it is good for me to send you the data. Can you give me your email?

Data Format

For CoNLL-x format, the schema is: ID, FORM, LEMMA, CPOSTAG, POSTAG, MORPH-FEATURES, HEAD, DEPREL, PHEAD, PDEPREL For NER data, the schema is: ID, FORM, POSTAG, CHUNK, NERTAG