XingWu_UCAS
XingWu_UCAS
https://www.paddlepaddle.org.cn Thanks ~
 I am not familiar with pytorch's DistributedDataParallel, and I am confused that why only teacher_model is applied DistributedDataParallel in general_distill.py ?
I tried the code on RTE/SNLI/MNLI tasks, but UDA's results are worse, did anyone tried it before ?
Hi, The dev result of coCondenser on MSMARCO-Passage-Ranking-Submissions leaderboard is 0.443. Is it the results on Large size model ? Thank you @luyug 
I pretrained a condeser-roberta-base on the same data and hyperparameters, but the results on downstream tasks were not high. Have you ever tried condenser pretraining on RoBERTa-base ? Thank you
 Hi, Have you tried the InfoNCE loss in Global-local Feature Alignment ? [CLS] and [MSK] in the same sentence constitute positive pairs [CLS] and [MSK] in different sentence constitute...
您好, 我下载预训练数据后发现里面有一些 & , < 这样被转义后的token,这些您有做 unescape 么?  感谢
sub.json is organized in the format: [{'image': '4385058960_b0f291553e.jpg', 'caption': 'a wooden chair in the living room', 'url': 'http://static.flickr.com/2723/4385058960_b0f291553e.jpg'}, ...} but the downloaded sbu_images.rar is extracted as: 0000/ 0001/ 0002/ 0003/...
### 🐛 Describe the bug 使用 gemini,必须是2的幂的卡数,不然出现 assert chunk_size % self.pg_size == 0 打印 chunk_size 是 40MB ### Environment 多台 8x80G A100,使用最新的code