yyht
yyht
Hi, nice work. When I apply it to shallower bert or gpt, after initialization, it often get NAN gradients(even for deeper architecture).
Hi, I have done pretraining on Chinese-dataset(50G) and run downstream finetuning on ChineseClue benchmark, the default hyperparameters ars the same to bert-base: learning_rate: 3e-5, epoch: 3 or 5 the finetuning...
hi, I am trying to do traing bert-base using tta for chinese, it got NAN with 1000-step optimization, I am wondering if you could give me some advice
hi, since you waveglow propose to use a soft-em version of vqvae, the core implementation is: " def _square_distance(x, code_book): x = tf.cast(x, tf.float32) code_book = tf.cast(code_book, tf.float32) x_sg =...
hi, i am very interested in your paper. I tried to do experiment on my own Finance News dataset to predict finicial event type given the finicial news but, the...
Hi, this work is very useful for my research. Could u share me the helpfulness dataset with human label? thanks