Shuai Zheng comments

Results 23 comments of


                                            Shuai Zheng

Fix checkpoint loading when zero optimizer states are not given

@tjruwase DeepSpeed initializes the FP32 master weights in the engine when `deepspeed.initialize` is called. After that we use deepspeed `load_checkpoint` to load the model weights without giving the zero state...

Fix checkpoint loading when zero optimizer states are not given

@tjruwase Yes. There are three reasons: 1) we may add additional layers in the finetuning, in which case the shape of the partition does not align so that we fail...

Fix checkpoint loading when zero optimizer states are not given

@tjruwase This is what I originally did to make it work. But I think this is frustrating if `load_checkpoint` cannot handle such case.

Fix checkpoint loading when zero optimizer states are not given

@tjruwase there is one more possible bug we found yesterday that zero 2 gives much higher accuracy than zero 1 in the finetuning (all the hyperparameters are the same except...

Fix checkpoint loading when zero optimizer states are not given

@tjruwase I will see if I can reproduce ZeRO-1 regression using public available dataset. Yes. That description captures my case. Also, I would like to get your attention on my...

pubmedbert

@liuzh91 once we have the mining tool, we can also extract data from different domains and use them for the research.

Code modify to make train_gnmt support multi gpus training

@pengxin99 Yes, the gradient needs to be averaged by the total number of tokens across all the GPUs.

sliding window self-attention cell

@sxjscience it seems the error `AttributeError: module 'mxnet.ndarray.numpy_extension' has no attribute 'sldwin_atten_score'` is due to that the mxnet version is not the latest.

Difference between embeddings Gluon and Huggingface

I think gluonnlp's vocab has different order from the one of huggingface. So it will be problematic if we simply copy the entire embedding matrix. @eric-haibin-lin.

Difference between embeddings Gluon and Huggingface

i think @eric-haibin-lin means that `embedding_gluonnlp[2] \not= embedding_hf[101]`, as you simply copied the embedding matrix without reordering.