DSLP icon indicating copy to clipboard operation
DSLP copied to clipboard

No glat_sd arch

Open bbo0924 opened this issue 3 years ago • 2 comments

Hi Chengyang, thanks for your great code! I'm trying to reproduce the GLAT+DSLP model, I checked your given training scripts, but I found there is no "--arch glat_sd" registered model in the code, is it should be "nat_sd_glat"? BTW, what's the meaning of "ss" and "sd"? Does "sd" mean supervised deeply? how about "ss" Thank for your answer!!

bbo0924 avatar Jun 17 '22 08:06 bbo0924

Hello @bbo0924 .

Yes, you are right. It should be nat_sd_glat. Sorry for the mistake, I will fix it. Thanks.

chenyangh avatar Jun 22 '22 03:06 chenyangh

The meaning of ss and sd was used for development, which I should have changed after writing the paper. So ss means schedule sampling, where I mix the ground truth tokens with predicted tokens. The s is a notation for layer-wise prediction, but I don't really remember why I used s. d means deep supervision.

chenyangh avatar Jun 22 '22 03:06 chenyangh